Table of Contents
Fetching ...

The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013--2023

Stefano M. Iacus, Devika Jain, Andrea Nasuto, Giuseppe Porro, Marcello Carammia, Andrea Vezzulli

TL;DR

The paper introduces the Human Flourishing Geographic Index (HFGI), a high-resolution, county- and state-level dataset of flourishing-related expressions derived from 2.6 billion geolocated U.S. tweets from 2013–2023. It uses fine-tuned LLMs (Llama 3.2 3B) to map tweets to 46–48 flourishing indicators aligned with the Global Flourishing Study, producing monthly and yearly indicators with salience measures. The authors validate HFGI against external data (TSGI, CDC mental health metrics, CPI) and explore relationships with climate risk, revealing meaningful spatial patterns, rural–urban differences, and an interpretation framework distinguishing expression propensity from prevalence. They provide comprehensive usage notes, data availability, and a codebook, enabling cross-disciplinary analyses of well-being, inequality, and social dynamics at an unprecedented scale and resolution.

Abstract

Quantifying human flourishing, a multidimensional construct including happiness, health, purpose, virtue, relationships, and financial stability, is critical for understanding societal well-being beyond economic indicators. Existing measures often lack fine spatial and temporal resolution. Here we introduce the Human Flourishing Geographic Index (HFGI), derived from analyzing approximately 2.6 billion geolocated U.S. tweets (2013-2023) using fine-tuned large language models to classify expressions across 48 indicators aligned with Harvard's Global Flourishing Study framework plus attitudes towards migration and perception of corruption. The dataset offers monthly and yearly county- and state-level indicators of flourishing-related discourse, validated to confirm that the measures accurately represent the underlying constructs and show expected correlations with established indicators. This resource enables multidisciplinary analyses of well-being, inequality, and social change at unprecedented resolution, offering insights into the dynamics of human flourishing as reflected in social media discourse across the United States over the past decade.

The Human Flourishing Geographic Index: A County-Level Dataset for the United States, 2013--2023

TL;DR

The paper introduces the Human Flourishing Geographic Index (HFGI), a high-resolution, county- and state-level dataset of flourishing-related expressions derived from 2.6 billion geolocated U.S. tweets from 2013–2023. It uses fine-tuned LLMs (Llama 3.2 3B) to map tweets to 46–48 flourishing indicators aligned with the Global Flourishing Study, producing monthly and yearly indicators with salience measures. The authors validate HFGI against external data (TSGI, CDC mental health metrics, CPI) and explore relationships with climate risk, revealing meaningful spatial patterns, rural–urban differences, and an interpretation framework distinguishing expression propensity from prevalence. They provide comprehensive usage notes, data availability, and a codebook, enabling cross-disciplinary analyses of well-being, inequality, and social dynamics at an unprecedented scale and resolution.

Abstract

Quantifying human flourishing, a multidimensional construct including happiness, health, purpose, virtue, relationships, and financial stability, is critical for understanding societal well-being beyond economic indicators. Existing measures often lack fine spatial and temporal resolution. Here we introduce the Human Flourishing Geographic Index (HFGI), derived from analyzing approximately 2.6 billion geolocated U.S. tweets (2013-2023) using fine-tuned large language models to classify expressions across 48 indicators aligned with Harvard's Global Flourishing Study framework plus attitudes towards migration and perception of corruption. The dataset offers monthly and yearly county- and state-level indicators of flourishing-related discourse, validated to confirm that the measures accurately represent the underlying constructs and show expected correlations with established indicators. This resource enables multidisciplinary analyses of well-being, inequality, and social change at unprecedented resolution, offering insights into the dynamics of human flourishing as reflected in social media discourse across the United States over the past decade.

Paper Structure

This paper contains 49 sections, 13 equations, 9 figures, 11 tables.

Figures (9)

  • Figure 1: Tweet volume over time in the 2.6B dataset for United States.
  • Figure 2: Spatial variation in the average salience (= share of tweets expressing the dimension) of selected flourishing dimensions across U.S. counties (2013–2023). Each panel shows the county-level mean salience—the share of tweets expressing a given dimension—computed over the full 2013–2023 period. Dimensions are selected to illustrate both highly prevalent (happiness) and less frequent but thematically distinct domains (belonging, finworry, corruption). Color scales are independent across panels to emphasize within-dimension spatial contrasts. Geographic patterns reveal coherent regional structure even for low-base topics, indicating that public expressions of well-being, material security, and civic concern vary systematically across the United States. Alaska, Hawaii, and Puerto Rico are shown in compact shifted layout.
  • Figure 3: Standardized offline and online religiosity across U.S. counties. Blue areas indicate below-average values, red areas above-average values (z-scores). The left and right panels show total and evangelical religious adherence from the RCMS 2020 (ARDA), while the central panel displays the standardized salience of religious discourse on social media (believegod, 2013–2023).
  • Figure 4: Estimated differences in well-being dimensions between rural and urban U.S. counties (2013–2023). Each bar represents the estimated coefficient $\beta_1$ from separate linear models of the form $y_i = \alpha + \beta_1 \text{Rural}_i + \beta_2 \log(\text{ntweets}_i) + \beta_3 \log(\text{population}_i) + \varepsilon_i$, where $y_i$ is the county-level mean value of each flourishing indicator. Positive coefficients indicate higher levels in rural counties, negative coefficients higher levels in urban counties. Bars are shown only for statistically significant effects (unadjusted $p<0.05$). Results highlight that expressive dimensions of religious belief, purpose, and subjective well-being tend to be stronger in rural areas, whereas civic and moral concern dimensions (e.g., political voice, delayed gratification, charity) are more salient in urban contexts.
  • Figure 5: Validation of the tweet-based perceived corruption indicator against the Transparency International Corruption Perceptions Index (CPI) for the United States, 2012–2024 (truncated to 2023 because of the HGFI corruption indicator). Both series are standardized (z-scores) to facilitate comparison. CPI: 0–100 scale where higher values indicate cleaner public sectors. HFGI indicator: conditional mean in $[0,+1]$, where higher values indicate more corruption-related expression.
  • ...and 4 more figures