Deriving Strategic Market Insights with Large Language Models: A Benchmark for Forward Counterfactual Generation
Keane Ong, Rui Mao, Deeksha Varshney, Paul Pu Liang, Erik Cambria, Gianmarco Mengaldo
TL;DR
This work introduces Fin-Force, a benchmark for forward counterfactual generation in finance, enabling scalable exploration of plausible future market developments from current news headlines. It defines two counterfactual types (opportunity and risk), curates 1368 headlines with a rigorous annotation protocol, and designs forward-looking metrics—Forward-Compatibility and Directionality—while evaluating zero-/few-shot prompting, SOTA counterfactual methods, and a self-training paradigm (SRLM). Findings show sampling-based counterfactuals and self-training yield the strongest performance, with smaller LLMs becoming competitive contenders, and highlight limitations of existing prompting-based approaches. The authors provide a public benchmark, supplementary data, and code to spur future research and practical, automated scenario analysis in finance, with potential extensions to multilingual data and explainability-driven applications.
Abstract
Counterfactual reasoning typically involves considering alternatives to actual events. While often applied to understand past events, a distinct form-forward counterfactual reasoning-focuses on anticipating plausible future developments. This type of reasoning is invaluable in dynamic financial markets, where anticipating market developments can powerfully unveil potential risks and opportunities for stakeholders, guiding their decision-making. However, performing this at scale is challenging due to the cognitive demands involved, underscoring the need for automated solutions. LLMs offer promise, but remain unexplored for this application. To address this gap, we introduce a novel benchmark, FIN-FORCE-FINancial FORward Counterfactual Evaluation. By curating financial news headlines and providing structured evaluation, FIN-FORCE supports LLM based forward counterfactual generation. This paves the way for scalable and automated solutions for exploring and anticipating future market developments, thereby providing structured insights for decision-making. Through experiments on FIN-FORCE, we evaluate state-of-the-art LLMs and counterfactual generation methods, analyzing their limitations and proposing insights for future research. We release the benchmark, supplementary data and all experimental codes at the following link: https://github.com/keanepotato/fin_force
