What Would an LLM Do? Evaluating Large Language Models for Policymaking to Alleviate Homelessness
Pierre Le Coz, Jia An Liu, Debarun Bhattacharjya, Georgina Curto, Serge Stinckwich
TL;DR
This work addresses homelessness policymaking by evaluating whether large language models align with domain experts through a CA-grounded benchmark across four cities and a universal context. It operationalizes the CA in a computational ABM framework by mapping policy proposals to a SAT matrix with $14$ needs and $11$ actions, enabling prospectively testable policy impact on PEH outcomes. The contributions include a novel cross-city benchmark, expert baselines, and an automated LLM-ABM pipeline that translates narrative policy proposals into agent behavior to compare social impacts. Findings show substantial variation in LLM policy preferences across models and contexts, but LLMs can achieve comparable or better aggregate PEH needs satisfaction in ABMs when guided by CA framing and local calibration, underscoring the need for guardrails and local expertise in deployment. The work advances scalable, dignity-centered, policy-testing methods for homelessness and informs responsible, context-aware use of LLMs in civic decision-making.
Abstract
Large language models (LLMs) are increasingly being adopted in high-stakes domains. Their potential to encode evolving social contexts and to generate plausible scenarios position them as promising tools in social policymaking. This article evaluates whether LLMs are aligned with domain experts (and among themselves) on policy recommendations to alleviate homelessness - a challenge affecting over 150 million people worldwide. We develop a novel benchmark comprised of decision scenarios across four cities, with policy choices that are grounded in the conceptual framework of the Capability Approach for human development. We also present an automated pipeline that connects the policies to an agent-based model in one location, and compare the social impact of the policies recommended by LLMs to those recommended by experts. Our exploratory analysis reveals variation across LLMs in their policy recommendations compared to local experts, yet suggests potential benefits of the use of LLMs to provide insights for policymaking, if paired with responsible guardrails, contextual calibrations, and local domain expertise. Our work operationalizes the Capability Approach in a computational framework and provides new insights on homelessness alleviation policymaking with a focus on human dignity.
