LLM-Generated Counterfactual Stress Scenarios for Portfolio Risk Simulation via Hybrid Prompt-RAG Pipeline
Masoud Soleimani
TL;DR
This work presents a transparent, auditable pipeline that combines retrieval-augmented LLMs with structured macro grounding to generate G7 macro scenarios and translate them into portfolio tail risk via a three-channel PCA-based framework. It demonstrates that prompt design and portfolio composition largely drive tail-risk variation, while retrieval and news provide only modest adjustments, yielding moderate but material tail amplification relative to historical baselines. The study introduces extensive plausibility checks, regime tagging, dispersion diagnostics, and snapshot-based reproducibility to support supervisory use. Overall, LLM-generated macro scenarios can scale and diversify stress narratives in a governance-friendly manner when paired with explicit structure, validation, and human oversight.
Abstract
We develop a transparent and fully auditable LLM-based pipeline for macro-financial stress testing, combining structured prompting with optional retrieval of country fundamentals and news. The system generates machine-readable macroeconomic scenarios for the G7, which cover GDP growth, inflation, and policy rates, and are translated into portfolio losses through a factor-based mapping that enables Value-at-Risk and Expected Shortfall assessment relative to classical econometric baselines. Across models, countries, and retrieval settings, the LLMs produce coherent and country-specific stress narratives, yielding stable tail-risk amplification with limited sensitivity to retrieval choices. Comprehensive plausibility checks, scenario diagnostics, and ANOVA-based variance decomposition show that risk variation is driven primarily by portfolio composition and prompt design rather than by the retrieval mechanism. The pipeline incorporates snapshotting, deterministic modes, and hash-verified artifacts to ensure reproducibility and auditability. Overall, the results demonstrate that LLM-generated macro scenarios, when paired with transparent structure and rigorous validation, can provide a scalable and interpretable complement to traditional stress-testing frameworks.
