Table of Contents
Fetching ...

Inferring fine-grained migration patterns across the United States

Gabriel Agostini, Rachel Young, Maria Fitzpatrick, Nikhil Garg, Emma Pierson

TL;DR

Inferring fine-grained migration patterns across the United States addresses the core problem of missing spatial granularity in US migration data. The authors introduce MIGRATE, a data-fusion framework that combines high-resolution proprietary Infutor address histories with coarse Census data via an iterative proportional fitting approach to produce annual CBG-to-CBG migration matrices for 2010–2019. MIGRATE is validated against multiple Census benchmarks, demonstrates substantial reductions in error and demographic biases relative to Infutor alone, and reveals both national patterns (homophily, upward mobility, distance) and local patterns (wildfire-driven out-migration) that are invisible in coarser data. The method provides a scalable, privacy-protecting resource intended for non-profit migration research and has broad utility for social, environmental, urban, and health science analyses.

Abstract

Fine-grained migration data illuminate demographic, environmental, and health phenomena. However, United States migration data have serious drawbacks: public data lack spatial granularity, and higher-resolution proprietary data suffer from multiple biases. To address this, we develop a method that fuses high-resolution proprietary data with coarse Census data to create MIGRATE: annual migration matrices capturing flows between 47.4 billion US Census Block Group pairs -- approximately four thousand times the spatial resolution of current public data. Our estimates are highly correlated with external ground-truth datasets and improve accuracy relative to raw proprietary data. We use MIGRATE to analyze national and local migration patterns. Nationally, we document demographic and temporal variation in homophily, upward mobility, and moving distance -- for example, rising moves into top-income-quartile block groups and racial disparities in upward mobility. Locally, MIGRATE reveals patterns such as wildfire-driven out-migration that are invisible in coarser previous data. We release MIGRATE as a resource for migration researchers.

Inferring fine-grained migration patterns across the United States

TL;DR

Inferring fine-grained migration patterns across the United States addresses the core problem of missing spatial granularity in US migration data. The authors introduce MIGRATE, a data-fusion framework that combines high-resolution proprietary Infutor address histories with coarse Census data via an iterative proportional fitting approach to produce annual CBG-to-CBG migration matrices for 2010–2019. MIGRATE is validated against multiple Census benchmarks, demonstrates substantial reductions in error and demographic biases relative to Infutor alone, and reveals both national patterns (homophily, upward mobility, distance) and local patterns (wildfire-driven out-migration) that are invisible in coarser data. The method provides a scalable, privacy-protecting resource intended for non-profit migration research and has broad utility for social, environmental, urban, and health science analyses.

Abstract

Fine-grained migration data illuminate demographic, environmental, and health phenomena. However, United States migration data have serious drawbacks: public data lack spatial granularity, and higher-resolution proprietary data suffer from multiple biases. To address this, we develop a method that fuses high-resolution proprietary data with coarse Census data to create MIGRATE: annual migration matrices capturing flows between 47.4 billion US Census Block Group pairs -- approximately four thousand times the spatial resolution of current public data. Our estimates are highly correlated with external ground-truth datasets and improve accuracy relative to raw proprietary data. We use MIGRATE to analyze national and local migration patterns. Nationally, we document demographic and temporal variation in homophily, upward mobility, and moving distance -- for example, rising moves into top-income-quartile block groups and racial disparities in upward mobility. Locally, MIGRATE reveals patterns such as wildfire-driven out-migration that are invisible in coarser previous data. We release MIGRATE as a resource for migration researchers.

Paper Structure

This paper contains 42 sections, 9 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: MIGRATE estimates. We estimate annual migration flows between all pairs of Census block groups (CBGs) from 2010-2019. (a) Average MIGRATE estimates of out-migration rates across the entire United States. (b-c)MIGRATE estimates of out-migration rates within New York City. MIGRATE estimates reveal granular spatial patterns invisible in publicly available county-to-county data (inset plot (b)). Out-migration rates for CBGs with fewer than 100 people are omitted.
  • Figure 1: Experimental validations of our method. Pearson correlation and reduction in RMSE achieved by our harmonization procedure on semi-synthetic data with real-world bias and independent noise. We report average metrics across all years; error bars plot standard deviation across years (from 2010-11 to 2018-19, $n=9$). Circles represent metrics computed on all entries of the flow matrix; squares represent metrics computed on movers only (off-diagonal elements). Columns assess metrics for flows at the CBG level, tract level, county level, and state level. We assess results when varying the level of bias (x-axis) and independent noise (line color).
  • Figure 1: Additional national migration statistics.(a) Flows between the ten types of CBGs discussed in the main text relative to the share of all movers who moved to CBGs in the particular group. Each cell reports the ratio $\frac{\text{share of movers from origin CBG group moving to destination CBG group}}{\text{share of all movers moving to destination CBG group}}$; for example, the top left value of $1.1\times$ reports the ratio of the share of movers from plurality white CBGs who move to plurality white CBGs (90%) to the share of movers from all CBGs who move to plurality white CBGs (80%). (b) Flows between the ten types of CBGs discussed in the main text when restricting to out-of-county movers demonstrate that homophily persists for long-distance moves.
  • Figure 1: Average income (in US dollars $) of destination CBGs and counties for New York City out-movers in the 2010-19 period. The large map uses MIGRATE estimates and 5-year ACS CBG data. The inset map shows the corresponding distribution using only publicly available migration data from 5-year ACS county-to-county estimates.
  • Figure 2: Validating the MIGRATE estimates. (a - c): MIGRATE estimates (y-axis) are highly correlated with Census data (x-axis), including (a) Census populations at the tract, and block group (CBG) level, (b) movers between each pair of states and each pair of counties (excluding people who remain within the same state or county), and (c) state and county in-migration rates (i.e., the number of people moving into an area as a fraction of the area's population). For in-migration rates, correlations are weighted by state or county population, and points are sized by population. (d - f) MIGRATE estimates increase agreement with Census datasets relative to raw Infutor data for population counts, movers between states and counties, and in-migration rate, respectively. We compute root mean squared error (RMSE) between (1) MIGRATE estimates and Census data and (2) Infutor data and Census data, and report the reduction in RMSE from using MIGRATE estimates. Bars show the mean reduction in RMSE across all data release years ($n=5$ for population and county-level migration datasets, $n=9$ for state-level migration datasets); error bars plot standard deviation across years. For the in-migration rates, RMSE is weighted by area population.
  • ...and 9 more figures