The following words and terms are commonly used in data journalism. Data journalists might want to familiarise themselves with them.
Often used words and phrases
- Algorithm:
- A set of rules or instructions that a computer follows to solve a problem or perform a task. In data journalism, algorithms can be used for various purposes. Link: Algorithm
- API (Application Programming Interface):
- A digital tool that lets you pull data directly from a website or database, often used by journalists to access updated datasets. Link: API
- Choropleth map:
- A map shaded in different colours to show how a number or rate changes by area (e.g., COVID-19 cases by county). Link: Choropleth map
- Computational thinking:
- The process of breaking down complex problems into smaller, manageable parts, and then creating algorithms to solve them. Link: Computational thinking
- Correlation:
- A relationship between two variables (note: correlation doesn’t mean causation). Link: Correlation
- CSV (Comma-Separated Values):
- A common, simple file format for datasets which is basically a spreadsheet saved as plain text. Link: CSV
- Data analysis:
- Examining data to identify trends, patterns, and relationships. Link: Data analysis
- Data bias:
- When data is skewed or incomplete journalists need to be alert to this to avoid misleading the audience. Link: Data bias
- Data cleansing (or wrangling):
- The process of fixing messy data in order to correct errors, fill in missing info, and format it so it’s ready for analysis. Link: Data cleansing
- Data ethics:
- Principles and guidelines for the responsible collection, analysis, and dissemination of data, with a focus on privacy, security, and fairness. Link: Data ethics
- Data journalism:
- The practice of using data to find, create, and tell news stories. It involves collecting, analysing, and visualising data to inform the public. Link: Data journalism
- Data leak (or breach):
- When private or sensitive data is released, intentionally or accidentally, newsrooms often investigate these. Link: Data leak or breach
- Data literacy:
- The ability to understand, interpret, and communicate data effectively. This includes critical thinking, statistical reasoning, and the ability to identify biases. Link: Data literacy
- Data mining:
- The process of extracting valuable information and patterns from large datasets. Link: Data mining
- Data scraping:
- Data scraping is the automated process of extracting data from websites or other sources and saving it into a structured format. Link: Data scraping
- Data transparency:
- Being open about how the data was handled, what assumptions were made, and what might be missing.
- Data visualisation:
- Representing data visually through charts, graphs, maps, and other graphical formats. Link: Data visualisation
- Dataset:
- Or data-set is a collection of related data, like a spreadsheet or table, often the starting point for a data story. Link: Dataset
- Deduplication:
- Removing repeated entries in a dataset to avoid counting the same thing twice. Link: Data deduplication
- Descriptive statistics:
- Simple summaries of data, such as averages, medians, and percentages, that help explain your findings. Link: Descriptive statistics
- FOIA (Freedom of Information Act) Request:
- A formal request for public records from a government agency which is used to obtain datasets not posted online. Link: Freedom of Information Act (by country)
- Geospatial data:
- Data that includes location information which is essential for making maps or analysing patterns by area. Link: Geospatial data
- Heat map:
- A graphic that uses colour intensity to show concentrations of activity or numbers. Link: Heat map
- Interactive graphics:
- Visuals that let readers explore data such as maps you can zoom in on or filters to compare regions.
- Interactive visualisation:
- Visualisations that allow users to explore and interact with the data. Link: Interactive visualisation
- JSON (JavaScript Object Notation):
- A format often used by websites and APIs to structure data. Journalists may need to convert this into tables. Link: JSON
- Machine learning:
- Computer systems analysing data to find patterns. Used in investigative journalism for things like identifying fake accounts. Link: Machine learning
- Margin of error:
- A measure of how much uncertainty there is in survey results. This is particularly important when reporting on political opinion polls. Link: Margin of error
- Natural Language Processing (NLP):
- A way to automatically analyse large amounts of text such as searching through thousands of documents for themes. Link: NLP
- Normalisation:
- Adjusting numbers to make fair comparisons such as calculating rates per 100,000 people instead of raw numbers. Link: Normalisation
- Open data:
- Data published by governments, organisations, or researchers that’s free for anyone to use in their reporting. Link: Open data
- Outlier:
- A data point that sticks out because it’s much higher or lower than the rest. Sometimes these lead to important news stories. Link: Outlier
- Parsing:
- Breaking down complex information (such as addresses or dates) into standardised parts for easier analysis. Link: Parsing
- Regression analysis:
- A more advanced statistical method to explore relationships between variables. This is sometimes used in deep journalistic investigations. Link: Regression analysis
- Sampling bias:
- This exists when the group surveyed or studied doesn’t represent the larger population. This can distort results and conclusions. Link: Sampling bias
- SQL (Structured Query Language):
- A coding language for searching through large databases. This is helpful for investigative journalism projects. Link: SQL
- Spreadsheet:
- A basic tool such as Excel or Google Sheets that most journalists use to store, sort, and analyse data. Link: Spreadsheet
- Statistical analysis:
- Using statistical methods to analyse data, including things such as finding the mean, median, and mode, and also finding standard deviations. Link: Statistical and data analysis
- Structured data:
- Data organised in rows and columns (such as Excel spreadsheets) that’s easy to sort and analyse. Link: Structured data analysis
- Time series data:
- Data collected over time. This is useful for spotting trends, such as changes in crime rates or housing prices. Link: Time series database
- Tooltip:
- A small pop-up box in a graphic that appears when readers hover over a data point to reveal details. Link: Tooltip
- Unstructured data:
- Data that doesn’t come in neat tables, such as PDFs, social media posts, or interview transcripts. Link: Unstructured data
- Web scraping:
- The process of automatically extracting data from websites. Link: Web scrapin
Related articles