Table of Contents
Fetching ...

AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval

Qi Yan, Raihan Seraj, Jiawei He, Lili Meng, Tristan Sylvain

TL;DR

The paper addresses the challenge of forecasting real-world events from unstructured news, where prior machine methods lag human forecasters. It introduces AutoCast++ a zero-shot ranking-based context retrieval pipeline with recency weighting, unsupervised summarization, and a FiD-based reader, trained with a human-alignment loss and numerical-output binning. The approach yields substantial gains across TF, MCQ, and numerical predictions, outperforming strong baselines across model sizes and demonstrating the value of task-aligned, zero-shot context selection for event forecasting. The work has practical impact for open-domain forecasting, showing that effective use of large-language-model-based relevance assessment and summarization can close much of the gap to human forecasters, with code available for reuse and extension.

Abstract

Machine-based prediction of real-world events is garnering attention due to its potential for informed decision-making. Whereas traditional forecasting predominantly hinges on structured data like time-series, recent breakthroughs in language models enable predictions using unstructured text. In particular, (Zou et al., 2022) unveils AutoCast, a new benchmark that employs news articles for answering forecasting queries. Nevertheless, existing methods still trail behind human performance. The cornerstone of accurate forecasting, we argue, lies in identifying a concise, yet rich subset of news snippets from a vast corpus. With this motivation, we introduce AutoCast++, a zero-shot ranking-based context retrieval system, tailored to sift through expansive news document collections for event forecasting. Our approach first re-ranks articles based on zero-shot question-passage relevance, honing in on semantically pertinent news. Following this, the chosen articles are subjected to zero-shot summarization to attain succinct context. Leveraging a pre-trained language model, we conduct both the relevance evaluation and article summarization without needing domain-specific training. Notably, recent articles can sometimes be at odds with preceding ones due to new facts or unanticipated incidents, leading to fluctuating temporal dynamics. To tackle this, our re-ranking mechanism gives preference to more recent articles, and we further regularize the multi-passage representation learning to align with human forecaster responses made on different dates. Empirical results underscore marked improvements across multiple metrics, improving the performance for multiple-choice questions (MCQ) by 48% and true/false (TF) questions by up to 8%. Code is available at https://github.com/BorealisAI/Autocast-plus-plus.

AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval

TL;DR

The paper addresses the challenge of forecasting real-world events from unstructured news, where prior machine methods lag human forecasters. It introduces AutoCast++ a zero-shot ranking-based context retrieval pipeline with recency weighting, unsupervised summarization, and a FiD-based reader, trained with a human-alignment loss and numerical-output binning. The approach yields substantial gains across TF, MCQ, and numerical predictions, outperforming strong baselines across model sizes and demonstrating the value of task-aligned, zero-shot context selection for event forecasting. The work has practical impact for open-domain forecasting, showing that effective use of large-language-model-based relevance assessment and summarization can close much of the gap to human forecasters, with code available for reuse and extension.

Abstract

Machine-based prediction of real-world events is garnering attention due to its potential for informed decision-making. Whereas traditional forecasting predominantly hinges on structured data like time-series, recent breakthroughs in language models enable predictions using unstructured text. In particular, (Zou et al., 2022) unveils AutoCast, a new benchmark that employs news articles for answering forecasting queries. Nevertheless, existing methods still trail behind human performance. The cornerstone of accurate forecasting, we argue, lies in identifying a concise, yet rich subset of news snippets from a vast corpus. With this motivation, we introduce AutoCast++, a zero-shot ranking-based context retrieval system, tailored to sift through expansive news document collections for event forecasting. Our approach first re-ranks articles based on zero-shot question-passage relevance, honing in on semantically pertinent news. Following this, the chosen articles are subjected to zero-shot summarization to attain succinct context. Leveraging a pre-trained language model, we conduct both the relevance evaluation and article summarization without needing domain-specific training. Notably, recent articles can sometimes be at odds with preceding ones due to new facts or unanticipated incidents, leading to fluctuating temporal dynamics. To tackle this, our re-ranking mechanism gives preference to more recent articles, and we further regularize the multi-passage representation learning to align with human forecaster responses made on different dates. Empirical results underscore marked improvements across multiple metrics, improving the performance for multiple-choice questions (MCQ) by 48% and true/false (TF) questions by up to 8%. Code is available at https://github.com/BorealisAI/Autocast-plus-plus.
Paper Structure (34 sections, 7 equations, 8 figures, 7 tables)

This paper contains 34 sections, 7 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Illustration of the Autocast++ components. Top: For each question, the retriever employs zero-shot relevance re-ranking and recency re-ranking to pinpoint relevant news articles from a large corpus, subsequently using unsupervised text summarization to establish a concise context. Bottom: Our FiD-based reader utilizes the generative decoder for predicting event outcomes. We also introduce an auxiliary alignment loss to synchronize with the responses of human forecasters.
  • Figure 2: Left: retrieval results of BM25 and our re-ranking retriever. Given a query seeking information about a hurricane until its expiration date, the BM25 identifies articles based on lexical similarity. However, these articles, closely aligned with the query start date, lack depth for a retrospective answer. In contrast, our re-ranking retriever focuses on more recent, relevant passages to effectively address the query. Right: Visualization of news recency score $s_t(t)$. As the query expiry date nears, human forecaster accuracy typically increases more rapidly. This score is a statistical measure capturing general patterns across the whole dataset.
  • Figure 3: An example of our proposed re-ranking retrieval with text summarization.
  • Figure 4: Example 1 of question-news relevance assessment.
  • Figure 5: Example 2 of question-news relevance assessment.
  • ...and 3 more figures