Table of Contents
Fetching ...

PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation

Zhengwei Tao, Zhi Jin, Bincheng Li, Xiaoying Bai, Haiyan Zhao, Chengfeng Dou, Xiancai Chen, Jia Li, Linyu Li, Chongyang Tao

TL;DR

A new benchmark, PROPHET, is introduced, which comprises inferable forecasting questions paired with relevant news for retrieval, and Causal Intervened Likelihood (CIL), a statistical measure that assesses inferability through causal inference is proposed.

Abstract

Predicting future events stands as one of the ultimate aspirations of artificial intelligence. Recent advances in large language model (LLM)-based systems have shown remarkable potential in forecasting future events, thereby garnering significant interest in the research community. Currently, several benchmarks have been established to evaluate the forecasting capabilities by formalizing the event prediction as a retrieval-augmented generation (RAG) and reasoning task. In these benchmarks, each prediction question is answered with relevant retrieved news articles. However, because there is no consideration on whether the questions can be supported by valid or sufficient supporting rationales, some of the questions in these benchmarks may be inherently noninferable. To address this issue, we introduce a new benchmark, PROPHET, which comprises inferable forecasting questions paired with relevant news for retrieval. To ensure the inferability of the benchmark, we propose Causal Intervened Likelihood (CIL), a statistical measure that assesses inferability through causal inference. In constructing this benchmark, we first collected recent trend forecasting questions and then filtered the data using CIL, resulting in an inferable benchmark for event prediction. Through extensive experiments, we first demonstrate the validity of CIL and in-depth investigations into event prediction with the aid of CIL. Subsequently, we evaluate several representative prediction systems on PROPHET, drawing valuable insights for future directions.

PROPHET: An Inferable Future Forecasting Benchmark with Causal Intervened Likelihood Estimation

TL;DR

A new benchmark, PROPHET, is introduced, which comprises inferable forecasting questions paired with relevant news for retrieval, and Causal Intervened Likelihood (CIL), a statistical measure that assesses inferability through causal inference is proposed.

Abstract

Predicting future events stands as one of the ultimate aspirations of artificial intelligence. Recent advances in large language model (LLM)-based systems have shown remarkable potential in forecasting future events, thereby garnering significant interest in the research community. Currently, several benchmarks have been established to evaluate the forecasting capabilities by formalizing the event prediction as a retrieval-augmented generation (RAG) and reasoning task. In these benchmarks, each prediction question is answered with relevant retrieved news articles. However, because there is no consideration on whether the questions can be supported by valid or sufficient supporting rationales, some of the questions in these benchmarks may be inherently noninferable. To address this issue, we introduce a new benchmark, PROPHET, which comprises inferable forecasting questions paired with relevant news for retrieval. To ensure the inferability of the benchmark, we propose Causal Intervened Likelihood (CIL), a statistical measure that assesses inferability through causal inference. In constructing this benchmark, we first collected recent trend forecasting questions and then filtered the data using CIL, resulting in an inferable benchmark for event prediction. Through extensive experiments, we first demonstrate the validity of CIL and in-depth investigations into event prediction with the aid of CIL. Subsequently, we evaluate several representative prediction systems on PROPHET, drawing valuable insights for future directions.

Paper Structure

This paper contains 32 sections, 11 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The upper Figure demonstrates the task of future forecasting. The lower half shows both inferable and non-inferable scenarios.
  • Figure 2: Illustration of assumptions. Nodes represent news variables that are in chronological order corresponding to their ${\mathcal{T}}$.
  • Figure 3: Retrieval evaluation.
  • Figure 4: Temporal analysis. The horizontal axis represents the entire prediction process.
  • Figure 5: In-depth analysis. The horizontal axis represents the entire prediction process.
  • ...and 1 more figures

Theorems & Definitions (1)

  • proof