Can LLMs Help Allocate Public Health Resources? A Case Study on Childhood Lead Testing

Mohamed Afane; Ying Wang; Juntao Chen

Can LLMs Help Allocate Public Health Resources? A Case Study on Childhood Lead Testing

Mohamed Afane, Ying Wang, Juntao Chen

TL;DR

The study tackles how to allocate limited public health resources for childhood lead testing by constructing a Priority Score that combines lead prevalence, testing gaps, and health coverage patterns across 136 neighborhoods in Chicago, NYC, and DC. It then quantifies whether state-of-the-art LLMs with agentic reasoning and deep research modes can autonomously allocate 1,000 test kits per city, finding substantial limitations with average accuracy around 0.46 and best ~0.66; common failures include neglecting the highest-risk neighborhoods and overemphasizing less vulnerable areas. A key finding is the strong cross-city association between public health coverage and lead vulnerability, justifying the Priority Score as a practical, data-driven framework for targeted interventions that still requires human validation. Overall, the results reveal that while LLMs hold promise for assisting public health decision-making, current capabilities are insufficient for autonomous, policy-level resource allocation without rigorous data integration and oversight.

Abstract

Public health agencies face critical challenges in identifying high-risk neighborhoods for childhood lead exposure with limited resources for outreach and intervention programs. To address this, we develop a Priority Score integrating untested children proportions, elevated blood lead prevalence, and public health coverage patterns to support optimized resource allocation decisions across 136 neighborhoods in Chicago, New York City, and Washington, D.C. We leverage these allocation tasks, which require integrating multiple vulnerability indicators and interpreting empirical evidence, to evaluate whether large language models (LLMs) with agentic reasoning and deep research capabilities can effectively allocate public health resources when presented with structured allocation scenarios. LLMs were tasked with distributing 1,000 test kits within each city based on neighborhood vulnerability indicators. Results reveal significant limitations: LLMs frequently overlooked neighborhoods with highest lead prevalence and largest proportions of untested children, such as West Englewood in Chicago, while allocating disproportionate resources to lower-priority areas like Hunts Point in New York City. Overall accuracy averaged 0.46, reaching a maximum of 0.66 with ChatGPT 5 Deep Research. Despite their marketed deep research capabilities, LLMs struggled with fundamental limitations in information retrieval and evidence-based reasoning, frequently citing outdated data and allowing non-empirical narratives about neighborhood conditions to override quantitative vulnerability indicators.

Can LLMs Help Allocate Public Health Resources? A Case Study on Childhood Lead Testing

TL;DR

Abstract

Can LLMs Help Allocate Public Health Resources? A Case Study on Childhood Lead Testing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)