Public Resource
An Introduction to the Data Biography
Heather Krause, We All Count

Building a data biography is an essential step along the path to equity in data science. A data biography is a comprehensive background of the conception, birth and life of any dataset. It’s not ethical to use a dataset without spending time getting a very good understanding of what the data means. This resource argues that this is not just an ethics issue but also an equity issue. For example, datasets are being used to calculate population-level rates sometimes do not include large portions of that population, and the people that are most often excluded are the most vulnerable or marginalized. We All Count has developed two versions of the data biography: a short version and a comprehensive version. The short version of the data biography is the basics. It consists of five core questions: WHO: Who collected the data? Who owns the data? HOW: The methods behind the data collection design and process? WHERE: In what locations was the data collected? Where is the data stored? WHY: For what purpose was the data collected? WHEN: When was the data collected? We All Count is also developing a much more detailed data biography tool.