The following words and terms are commonly used in data journalism. Data journalists might want to familiarise themselves with them.
Often used words and phrases
- Algorithm: 
- A set of rules or instructions that a computer follows to solve a problem or perform a task. In data journalism, algorithms can be used for various purposes. Link: Algorithm
 
 - API (Application Programming Interface):
- A digital tool that lets you pull data directly from a website or database, often used by journalists to access updated datasets. Link: API
 
 - Choropleth map:
- A map shaded in different colours to show how a number or rate changes by area (e.g., COVID-19 cases by county). Link: Choropleth map
 
 - Computational thinking:
- The process of breaking down complex problems into smaller, manageable parts, and then creating algorithms to solve them. Link: Computational thinking
 
 - Correlation:
- A relationship between two variables (note: correlation doesn’t mean causation). Link: Correlation
 
 - CSV (Comma-Separated Values):
- A common, simple file format for datasets which is basically a spreadsheet saved as plain text. Link: CSV
 
 - Data analysis:
- Examining data to identify trends, patterns, and relationships. Link: Data analysis
 
 - Data bias:
- When data is skewed or incomplete journalists need to be alert to this to avoid misleading the audience. Link: Data bias
 
 - Data cleansing (or wrangling):
- The process of fixing messy data in order to correct errors, fill in missing info, and format it so it’s ready for analysis. Link: Data cleansing
 
 - Data ethics:
- Principles and guidelines for the responsible collection, analysis, and dissemination of data, with a focus on privacy, security, and fairness. Link: Data ethics
 
 - Data journalism:
- The practice of using data to find, create, and tell news stories. It involves collecting, analysing, and visualising data to inform the public. Link: Data journalism
 
 - Data leak (or breach):
- When private or sensitive data is released, intentionally or accidentally, newsrooms often investigate these. Link: Data leak or breach
 
 - Data literacy:
- The ability to understand, interpret, and communicate data effectively. This includes critical thinking, statistical reasoning, and the ability to identify biases. Link: Data literacy
 
 - Data mining:
- The process of extracting valuable information and patterns from large datasets. Link: Data mining
 
 - Data scraping:
- Data scraping is the automated process of extracting data from websites or other sources and saving it into a structured format. Link: Data scraping
 
 - Data transparency:
- Being open about how the data was handled, what assumptions were made, and what might be missing.
 
 - Data visualisation:
- Representing data visually through charts, graphs, maps, and other graphical formats. Link: Data visualisation
 
 - Dataset:
- Or data-set is a collection of related data, like a spreadsheet or table, often the starting point for a data story. Link: Dataset
 
 - Deduplication:
- Removing repeated entries in a dataset to avoid counting the same thing twice. Link: Data deduplication
 
 - Descriptive statistics:
- Simple summaries of data, such as averages, medians, and percentages, that help explain your findings. Link: Descriptive statistics
 
 - FOIA (Freedom of Information Act) Request:
- A formal request for public records from a government agency which is used to obtain datasets not posted online. Link: Freedom of Information Act (by country)
 
 - Geospatial data:
- Data that includes location information which is essential for making maps or analysing patterns by area. Link: Geospatial data
 
 - Heat map:
- A graphic that uses colour intensity to show concentrations of activity or numbers. Link: Heat map
 
 - Interactive graphics:
- Visuals that let readers explore data such as maps you can zoom in on or filters to compare regions.
 
 - Interactive visualisation:
- Visualisations that allow users to explore and interact with the data. Link: Interactive visualisation
 
 - JSON (JavaScript Object Notation):
- A format often used by websites and APIs to structure data. Journalists may need to convert this into tables. Link: JSON
 
 - Machine learning:
- Computer systems analysing data to find patterns. Used in investigative journalism for things like identifying fake accounts. Link: Machine learning
 
 - Margin of error:
- A measure of how much uncertainty there is in survey results. This is particularly important when reporting on political opinion polls. Link: Margin of error
 
 - Natural Language Processing (NLP):
- A way to automatically analyse large amounts of text such as searching through thousands of documents for themes. Link: NLP
 
 - Normalisation:
- Adjusting numbers to make fair comparisons such as calculating rates per 100,000 people instead of raw numbers. Link: Normalisation
 
 - Open data:
- Data published by governments, organisations, or researchers that’s free for anyone to use in their reporting. Link: Open data
 
 - Outlier:
- A data point that sticks out because it’s much higher or lower than the rest. Sometimes these lead to important news stories. Link: Outlier
 
 - Parsing:
- Breaking down complex information (such as addresses or dates) into standardised parts for easier analysis. Link: Parsing
 
 - Regression analysis:
- A more advanced statistical method to explore relationships between variables. This is sometimes used in deep journalistic investigations. Link: Regression analysis
 
 - Sampling bias:
- This exists when the group surveyed or studied doesn’t represent the larger population. This can distort results and conclusions. Link: Sampling bias
 
 - SQL (Structured Query Language):
- A coding language for searching through large databases. This is helpful for investigative journalism projects. Link: SQL
 
 - Spreadsheet:
- A basic tool such as Excel or Google Sheets that most journalists use to store, sort, and analyse data. Link: Spreadsheet
 
 - Statistical analysis:
- Using statistical methods to analyse data, including things such as finding the mean, median, and mode, and also finding standard deviations. Link: Statistical and data analysis
 
 - Structured data:
- Data organised in rows and columns (such as Excel spreadsheets) that’s easy to sort and analyse. Link: Structured data analysis
 
 - Time series data:
- Data collected over time. This is useful for spotting trends, such as changes in crime rates or housing prices. Link: Time series database
 
 - Tooltip:
- A small pop-up box in a graphic that appears when readers hover over a data point to reveal details. Link: Tooltip
 
 - Unstructured data:
- Data that doesn’t come in neat tables, such as PDFs, social media posts, or interview transcripts. Link: Unstructured data
 
 - Web scraping:
- The process of automatically extracting data from websites. Link: Web scrapin
 
 
Related articles








