National and state-level datasets of United States forensic DNA databases 2001--2025
Yemko Pryor, Joao Pedro Donadio, Samantha C. Muller, Jenna Wilson, Tina Lasisi
TL;DR
The paper tackles the lack of harmonized, longitudinal data on U.S. forensic DNA databases by constructing three integrated datasets: a national NDIS time series (2001–2025) from archived FBI pages, a state-level SDIS dataset with arrestee counts and policy metadata, and FOIA-derived demographic and annual collection data. It employs a three-pronged methodological approach—reconstructing federal statistics via the Wayback Machine, compiling state policies and counts, and digitizing Murphy & Tong appendices—coupled with rigorous validation including anomaly detection and external calibration. The contributions enable robust longitudinal and cross-jurisdictional analyses of database growth, governance, and reporting practices, with transparent, versioned data and reproducible code. The resources facilitate assessment of policy impact, inter-state differences, and the historical evolution of CODIS infrastructure, with public availability on Zenodo and GitHub to support reuse across research and policy applications.
Abstract
Forensic DNA databases in the United States have expanded substantially over the past two decades. However, comprehensive, harmonized data describing database structure and composition remain limited. This dataset series documents forensic DNA infrastructure across national and state levels from 2001 to 2025. It includes a reconstructed time series of monthly National DNA Index System (NDIS) statistics from FBI archives, capturing counts of offender, arrestee, and forensic profiles, participating laboratory totals, and investigations aided. A complementary dataset compiles publicly available state-level statistics and policy metadata on arrestee collection laws, familial search practices, and DNA collection statutes across all 50 states. A third dataset provides standardized demographic and annual collection data obtained through previously published public records requests, including racial and gender composition where reported. Together, these resources provide a foundation for studying the historical development of forensic DNA systems in the U.S., enabling longitudinal and cross-sectional analyses of database growth, policy variation, and reporting practices across jurisdictions.
