Wisdom of the Crowds in Forecasting: Forecast Summarization for Supporting Future Event Prediction
Anisha Saha, Adam Jatowt
TL;DR
The paper tackles predicting future events by aggregating crowd forecasts expressed in text (FEP-CW), addressing limitations of traditional numerical forecasting in capturing semantic event information. It provides a comprehensive survey of data sources, extraction methods (time expressions, future markers, morphology, LLMs) and aggregation techniques (frequency, clustering, ranking, LLM-based approaches), and introduces a novel data model for future-related statements to improve aggregation. Key contributions include a synthesis of 36 studies, discussion of dataset preprocessing and obsolescence challenges, and the proposal of a structured forecast representation to support richer, more interpretable predictions. The work highlights practical implications for real-time, crowd-informed forecasting and outlines future directions such as multilingual data, domain-specific datasets, bias mitigation, and retrieval-augmented forecasting to enhance accuracy and reliability.
Abstract
Future Event Prediction (FEP) is an essential activity whose demand and application range across multiple domains. While traditional methods like simulations, predictive and time-series forecasting have demonstrated promising outcomes, their application in forecasting complex events is not entirely reliable due to the inability of numerical data to accurately capture the semantic information related to events. One forecasting way is to gather and aggregate collective opinions on the future to make predictions as cumulative perspectives carry the potential to help estimating the likelihood of upcoming events. In this work, we organize the existing research and frameworks that aim to support future event prediction based on crowd wisdom through aggregating individual forecasts. We discuss the challenges involved, available datasets, as well as the scope of improvement and future research directions for this task. We also introduce a novel data model to represent individual forecast statements.
