Data journalism glossary

Image of a network interface card created with Gemini Imagen 3 AI by Media Helping MediaThe following words and terms are commonly used in data journalism. Data journalists might want to familiarise themselves with them.

Often used words and phrases

  • Algorithm:
    • A set of rules or instructions that a computer follows to solve a problem or perform a task. In data journalism, algorithms can be used for various purposes. Link: Algorithm
  • API (Application Programming Interface):
    • A digital tool that lets you pull data directly from a website or database, often used by journalists to access updated datasets. Link: API
  • Choropleth map:
    • A map shaded in different colours to show how a number or rate changes by area (e.g., COVID-19 cases by county). Link: Choropleth map
  • Computational thinking:
    • The process of breaking down complex problems into smaller, manageable parts, and then creating algorithms to solve them. Link: Computational thinking
  • Correlation:
    • A relationship between two variables (note: correlation doesn’t mean causation). Link: Correlation
  • CSV (Comma-Separated Values):
    • A common, simple file format for datasets which is basically a spreadsheet saved as plain text. Link: CSV
  • Data analysis:
    • Examining data to identify trends, patterns, and relationships. Link: Data analysis
  • Data bias:
    • When data is skewed or incomplete journalists need to be alert to this to avoid misleading the audience. Link: Data bias
  • Data cleansing (or wrangling):
    • The process of fixing messy data in order to correct errors, fill in missing info, and format it so it’s ready for analysis. Link: Data cleansing
  • Data ethics:
    • Principles and guidelines for the responsible collection, analysis, and dissemination of data, with a focus on privacy, security, and fairness. Link: Data ethics
  • Data journalism:
    • The practice of using data to find, create, and tell news stories. It involves collecting, analysing, and visualising data to inform the public. Link: Data journalism
  • Data leak (or breach):
    • When private or sensitive data is released, intentionally or accidentally, newsrooms often investigate these. Link: Data leak or breach
  • Data literacy:
    • The ability to understand, interpret, and communicate data effectively. This includes critical thinking, statistical reasoning, and the ability to identify biases. Link: Data literacy
  • Data mining:
    • The process of extracting valuable information and patterns from large datasets. Link: Data mining
  • Data scraping:
    • Data scraping is the automated process of extracting data from websites or other sources and saving it into a structured format. Link: Data scraping
  • Data transparency:
    • Being open about how the data was handled, what assumptions were made, and what might be missing.
  • Data visualisation:
    • Representing data visually through charts, graphs, maps, and other graphical formats. Link: Data visualisation
  • Dataset:
    • Or data-set is a collection of related data, like a spreadsheet or table, often the starting point for a data story. Link: Dataset
  • Deduplication:
    • Removing repeated entries in a dataset to avoid counting the same thing twice. Link: Data deduplication
  • Descriptive statistics:
    • Simple summaries of data, such as averages, medians, and percentages, that help explain your findings. Link: Descriptive statistics
  • FOIA (Freedom of Information Act) Request:
  • Geospatial data:
    • Data that includes location information which is essential for making maps or analysing patterns by area. Link: Geospatial data
  • Heat map:
    • A graphic that uses colour intensity to show concentrations of activity or numbers. Link: Heat map
  • Interactive graphics:
    • Visuals that let readers explore data such as maps you can zoom in on or filters to compare regions.
  • Interactive visualisation:
  • JSON (JavaScript Object Notation):
    • A format often used by websites and APIs to structure data. Journalists may need to convert this into tables. Link: JSON
  • Machine learning:
    • Computer systems analysing data to find patterns. Used in investigative journalism for things like identifying fake accounts. Link: Machine learning
  • Margin of error:
    • A measure of how much uncertainty there is in survey results. This is particularly important when reporting on political opinion polls. Link: Margin of error
  • Natural Language Processing (NLP):
    • A way to automatically analyse large amounts of text such as searching through thousands of documents for themes. Link: NLP
  • Normalisation:
    • Adjusting numbers to make fair comparisons such as calculating rates per 100,000 people instead of raw numbers. Link: Normalisation
  • Open data:
    • Data published by governments, organisations, or researchers that’s free for anyone to use in their reporting. Link: Open data
  • Outlier:
    • A data point that sticks out because it’s much higher or lower than the rest. Sometimes these lead to important news stories. Link: Outlier
  • Parsing:
    • Breaking down complex information (such as addresses or dates) into standardised parts for easier analysis. Link: Parsing
  • Regression analysis:
    • A more advanced statistical method to explore relationships between variables. This is sometimes used in deep journalistic investigations. Link: Regression analysis
  • Sampling bias:
    • This exists when the group surveyed or studied doesn’t represent the larger population. This can distort results and conclusions. Link: Sampling bias
  • SQL (Structured Query Language):
    • A coding language for searching through large databases. This is helpful for investigative journalism projects. Link: SQL
  • Spreadsheet:
    • A basic tool such as Excel or Google Sheets that most journalists use to store, sort, and analyse data. Link: Spreadsheet
  • Statistical analysis:
    • Using statistical methods to analyse data, including things such as finding the mean, median, and mode, and also finding standard deviations. Link: Statistical and data analysis
  • Structured data:
    • Data organised in rows and columns (such as Excel spreadsheets) that’s easy to sort and analyse. Link: Structured data analysis
  • Time series data:
    • Data collected over time. This is useful for spotting trends, such as changes in crime rates or housing prices. Link: Time series database
  • Tooltip:
    • A small pop-up box in a graphic that appears when readers hover over a data point to reveal details. Link: Tooltip
  • Unstructured data:
    • Data that doesn’t come in neat tables, such as PDFs, social media posts, or interview transcripts. Link: Unstructured data
  • Web scraping:
    • The process of automatically extracting data from websites. Link: Web scrapin

Related articles

Data journalism – resources and tools

What is data journalism?

Good journalism has always been about data