Can Base ChatGPT be Used for Forecasting without Additional Optimization?

Van Pham; Scott Cunningham

Can Base ChatGPT be Used for Forecasting without Additional Optimization?

Van Pham, Scott Cunningham

TL;DR

The study probes whether base ChatGPT models can forecast future events without extra optimization by exploiting a training-data cutoff around September 2021. It contrasts direct forecasting with future narrative prompting, finding that narrative prompts dramatically enhance GPT-4’s predictive accuracy for 2022 Academy Award outcomes and select macroeconomic variables, though reliability varies by domain and prompt. A falsification design confirms the models’ predictions do not simply leak post-cutoff information, and a May 2024 re-run with updated models shows further improvements when the training window includes the events being predicted. The work highlights a novel use of narrative prompting to access predictive capabilities in LLMs, while raising ethical and safety considerations for high-stakes applications and calling for careful framing of such tasks in research and practice.

Abstract

This study investigates whether OpenAI's ChatGPT-3.5 and ChatGPT-4 can forecast future events. To evaluate the accuracy of the predictions, we take advantage of the fact that the training data at the time of our experiments (mid 2023) stopped at September 2021, and ask about events that happened in 2022. We employed two prompting strategies: direct prediction and what we call future narratives which ask ChatGPT to tell fictional stories set in the future with characters retelling events that happened in the past, but after ChatGPT's training data had been collected. We prompted ChatGPT to engage in storytelling, particularly within economic contexts. After analyzing 100 trials, we find that future narrative prompts significantly enhanced ChatGPT-4's forecasting accuracy. This was especially evident in its predictions of major Academy Award winners as well as economic trends, the latter inferred from scenarios where the model impersonated public figures like the Federal Reserve Chair, Jerome Powell. As a falsification exercise, we repeated our experiments in May 2024 at which time the models included more recent training data. ChatGPT-4's accuracy significantly improved when the training window included the events being prompted for, achieving 100% accuracy in many instances. The poorer accuracy for events outside of the training window suggests that in the 2023 prediction experiments, ChatGPT-4 was forming predictions based solely on its training data. Narrative prompting also consistently outperformed direct prompting. These findings indicate that narrative prompts leverage the models' capacity for hallucinatory narrative construction, facilitating more effective data synthesis and extrapolation than straightforward predictions. Our research reveals new aspects of LLMs' predictive capabilities and suggests potential future applications in analytical contexts.

Can Base ChatGPT be Used for Forecasting without Additional Optimization?

TL;DR

Abstract

Can Base ChatGPT be Used for Forecasting without Additional Optimization?

Authors

TL;DR

Abstract

Table of Contents

Figures (40)