Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas
Giulia Iadisernia, Carolina Camassa
TL;DR
This study systematically evaluates whether persona-based prompting enhances macroeconomic forecasting with LLMs by constructing 2,368 economics-related personas from the PersonaHub corpus and applying them to GPT-4o to replicate the ECB Survey of Professional Forecasters across 50 quarterly rounds (2013Q1–2025Q2). It generates 118,400 AI forecasts for four macro variables (HICP, HICPX, RGDP, UNR) at multiple horizons and compares them to human SPF medians and baseline prompts. A controlled ablation shows that persona descriptions provide no measurable forecasting advantage, while AI forecasts are broadly competitive with human forecasters and exhibit markedly lower dispersion. The out-of-sample evaluation on 2024–2025 data demonstrates competitive performance despite data not being present in training, suggesting that robust data and modeling choices trump elaborate persona engineering for macroeconomic forecasting. Practically, the results imply that central banks and institutions could deploy synthetic forecasters effectively but should focus on data integration and model improvements rather than extensive prompt customization.
Abstract
We evaluate whether persona-based prompting improves Large Language Model (LLM) performance on macroeconomic forecasting tasks. Using 2,368 economics-related personas from the PersonaHub corpus, we prompt GPT-4o to replicate the ECB Survey of Professional Forecasters across 50 quarterly rounds (2013-2025). We compare the persona-prompted forecasts against the human experts panel, across four target variables (HICP, core HICP, GDP growth, unemployment) and four forecast horizons. We also compare the results against 100 baseline forecasts without persona descriptions to isolate its effect. We report two main findings. Firstly, GPT-4o and human forecasters achieve remarkably similar accuracy levels, with differences that are statistically significant yet practically modest. Our out-of-sample evaluation on 2024-2025 data demonstrates that GPT-4o can maintain competitive forecasting performance on unseen events, though with notable differences compared to the in-sample period. Secondly, our ablation experiment reveals no measurable forecasting advantage from persona descriptions, suggesting these prompt components can be omitted to reduce computational costs without sacrificing accuracy. Our results provide evidence that GPT-4o can achieve competitive forecasting accuracy even on out-of-sample macroeconomic events, if provided with relevant context data, while revealing that diverse prompts produce remarkably homogeneous forecasts compared to human panels.
