Table of Contents
Fetching ...

Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas

Giulia Iadisernia, Carolina Camassa

TL;DR

This study systematically evaluates whether persona-based prompting enhances macroeconomic forecasting with LLMs by constructing 2,368 economics-related personas from the PersonaHub corpus and applying them to GPT-4o to replicate the ECB Survey of Professional Forecasters across 50 quarterly rounds (2013Q1–2025Q2). It generates 118,400 AI forecasts for four macro variables (HICP, HICPX, RGDP, UNR) at multiple horizons and compares them to human SPF medians and baseline prompts. A controlled ablation shows that persona descriptions provide no measurable forecasting advantage, while AI forecasts are broadly competitive with human forecasters and exhibit markedly lower dispersion. The out-of-sample evaluation on 2024–2025 data demonstrates competitive performance despite data not being present in training, suggesting that robust data and modeling choices trump elaborate persona engineering for macroeconomic forecasting. Practically, the results imply that central banks and institutions could deploy synthetic forecasters effectively but should focus on data integration and model improvements rather than extensive prompt customization.

Abstract

We evaluate whether persona-based prompting improves Large Language Model (LLM) performance on macroeconomic forecasting tasks. Using 2,368 economics-related personas from the PersonaHub corpus, we prompt GPT-4o to replicate the ECB Survey of Professional Forecasters across 50 quarterly rounds (2013-2025). We compare the persona-prompted forecasts against the human experts panel, across four target variables (HICP, core HICP, GDP growth, unemployment) and four forecast horizons. We also compare the results against 100 baseline forecasts without persona descriptions to isolate its effect. We report two main findings. Firstly, GPT-4o and human forecasters achieve remarkably similar accuracy levels, with differences that are statistically significant yet practically modest. Our out-of-sample evaluation on 2024-2025 data demonstrates that GPT-4o can maintain competitive forecasting performance on unseen events, though with notable differences compared to the in-sample period. Secondly, our ablation experiment reveals no measurable forecasting advantage from persona descriptions, suggesting these prompt components can be omitted to reduce computational costs without sacrificing accuracy. Our results provide evidence that GPT-4o can achieve competitive forecasting accuracy even on out-of-sample macroeconomic events, if provided with relevant context data, while revealing that diverse prompts produce remarkably homogeneous forecasts compared to human panels.

Prompting for Policy: Forecasting Macroeconomic Scenarios with Synthetic LLM Personas

TL;DR

This study systematically evaluates whether persona-based prompting enhances macroeconomic forecasting with LLMs by constructing 2,368 economics-related personas from the PersonaHub corpus and applying them to GPT-4o to replicate the ECB Survey of Professional Forecasters across 50 quarterly rounds (2013Q1–2025Q2). It generates 118,400 AI forecasts for four macro variables (HICP, HICPX, RGDP, UNR) at multiple horizons and compares them to human SPF medians and baseline prompts. A controlled ablation shows that persona descriptions provide no measurable forecasting advantage, while AI forecasts are broadly competitive with human forecasters and exhibit markedly lower dispersion. The out-of-sample evaluation on 2024–2025 data demonstrates competitive performance despite data not being present in training, suggesting that robust data and modeling choices trump elaborate persona engineering for macroeconomic forecasting. Practically, the results imply that central banks and institutions could deploy synthetic forecasters effectively but should focus on data integration and model improvements rather than extensive prompt customization.

Abstract

We evaluate whether persona-based prompting improves Large Language Model (LLM) performance on macroeconomic forecasting tasks. Using 2,368 economics-related personas from the PersonaHub corpus, we prompt GPT-4o to replicate the ECB Survey of Professional Forecasters across 50 quarterly rounds (2013-2025). We compare the persona-prompted forecasts against the human experts panel, across four target variables (HICP, core HICP, GDP growth, unemployment) and four forecast horizons. We also compare the results against 100 baseline forecasts without persona descriptions to isolate its effect. We report two main findings. Firstly, GPT-4o and human forecasters achieve remarkably similar accuracy levels, with differences that are statistically significant yet practically modest. Our out-of-sample evaluation on 2024-2025 data demonstrates that GPT-4o can maintain competitive forecasting performance on unseen events, though with notable differences compared to the in-sample period. Secondly, our ablation experiment reveals no measurable forecasting advantage from persona descriptions, suggesting these prompt components can be omitted to reduce computational costs without sacrificing accuracy. Our results provide evidence that GPT-4o can achieve competitive forecasting accuracy even on out-of-sample macroeconomic events, if provided with relevant context data, while revealing that diverse prompts produce remarkably homogeneous forecasts compared to human panels.

Paper Structure

This paper contains 38 sections, 3 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: AI and human forecasters achieve remarkably similar accuracy across key macroeconomic variables. Time series comparison of realized outcomes (black), human expert forecasts from the ECB Survey of Professional Forecasters (orange), and AI forecasts using 2,368 synthetic personas (blue) for euro area core inflation (2016-2025). Despite using a variety of persona descriptions, LLM predictions converge to a much narrower forecast distribution compared to the human experts.
  • Figure 2: Comparison of AI persona-based and human forecasts for current-year horizon across four ECB-SPF variables (2013-2025): (a) HICP inflation, (b) HICP core inflation, (c) Real GDP growth, and (d) Unemployment rate. Gray shaded regions indicate out-of-sample evaluation period. AI-generated median forecasts often, but not always, match human forecasts; this occurs both in the in-sample and out-of-sample surveys.
  • Figure 3: Persona prompting yields statistically indistinguishable error distributions. Kernel density estimates of absolute forecast errors for GPT-4o with persona descriptions (blue) versus baseline prompts without personas (orange) across all variable-horizon-round combinations. The near-perfect overlap supports our null hypothesis ($t = -1.02$, $p = 0.31$; Kolmogorov-Smirnov $D = 0.05$, $p = 0.28$).
  • Figure 4: Distribution of forecast errors by variable and horizon. Each panel shows kernel density estimates of errors for AI forecasts (blue) and human SPF forecasts (orange) across selected survey rounds. Top row compares HICP inflation errors at current-year (t0) and two-years (t2) horizons. Bottom row shows the same comparison for real GDP growth. AI forecasts consistently exhibit lower dispersion and more concentrated error distributions than human forecasters across both inflation measures and forecast horizons.
  • Figure 5: AI win share for HICP inflation rate forecasts across horizons and survey rounds.
  • ...and 3 more figures