On Large Language Models as Data Sources for Policy Deliberation on Climate Change and Sustainability
Rachel Bina, Kha Luong, Shrey Mehta, Daphne Pang, Mingjun Xie, Christine Chou, Steven O. Kimbrough
TL;DR
This study investigates whether Large Language Models, exemplified by GPT-4, can provide credible evaluation scores for starter multi-criteria decision-making (MCDM) models used in climate and sustainability policy deliberation. It formalizes the data structure as an ACS (alternatives, evaluation criteria, scores) or $P$ table and uses nine quality-of-life criteria plus two climate-related criteria (mitigation and adaptation) to evaluate a set of policy alternatives; scores $s_{i,j}$ are obtained via GPT-4 prompts and organized into a GPT-4 ACS table, which is then analyzed with the TOPSIS method. The authors compare the GPT-4-derived rankings to those produced by an informed human assessment (IA ACS) to assess validity, finding substantial agreement in policy rankings, and thus provisionally validate GPT-4 as a credible starting point for deliberation when vetting is applied. The paper emphasizes a draft-and-revise philosophy, producing actionable artifacts (ACS tables, TOPSIS results, and policy rankings) that can seed public deliberation and be updated as new information arrives, thereby supporting ongoing climate policy discussions. Across the literature, the work also acknowledges LLM reliability concerns (hallucinations) and advocates cautious, context-grounded use with human-in-the-loop checks to mitigate risks and improve decision support in dynamic policy contexts. The practical impact is a scalable, transparent starting point for policy deliberation that lowers initial scoring burden while enabling iterative refinement and stakeholder engagement.
Abstract
We pose the research question, "Can LLMs provide credible evaluation scores, suitable for constructing starter MCDM models that support commencing deliberation regarding climate and sustainability policies?" In this exploratory study we i. Identify a number of interesting policy alternatives that are actively considered by local governments in the United States (and indeed around the world). ii. Identify a number of quality-of-life indicators as apt evaluation criteria for these policies. iii. Use GPT-4 to obtain evaluation scores for the policies on multiple criteria. iv. Use the TOPSIS MCDM method to rank the policies based on the obtained evaluation scores. v. Evaluate the quality and validity of the resulting table ensemble of scores by comparing the TOPSIS-based policy rankings with those obtained by an informed assessment exercise. We find that GPT-4 is in rough agreement with the policy rankings of our informed assessment exercise. Hence, we conclude (always provisionally and assuming a modest level of vetting) that GPT-4 can be used as a credible input, even starting point, for subsequent deliberation processes on climate and sustainability policies.
