Can ChatGPT Compute Trustworthy Sentiment Scores from Bloomberg Market Wraps?

Baptiste Lefort; Eric Benhamou; Jean-Jacques Ohana; David Saltiel; Beatrice Guez; Damien Challet

Can ChatGPT Compute Trustworthy Sentiment Scores from Bloomberg Market Wraps?

Baptiste Lefort, Eric Benhamou, Jean-Jacques Ohana, David Saltiel, Beatrice Guez, Damien Challet

TL;DR

This paper investigates whether ChatGPT can generate trustworthy sentiment scores from Bloomberg Market Wraps and link them to stock-market movements. It introduces a two-step prompting pipeline to extract headlines and classify their sentiment, forming a daily sentiment score and a cumulated score over horizon $d$, with the score defined as $S = \frac{\sum p(h_i) - \sum n(h_i)}{\sum p(h_i) + \sum n(h_i)}$ and $S_d$ aggregating over $d$ days. Across major equity markets and horizons, the study finds a robust, positive relationship between cumulative sentiment and forward returns at short-to-mid horizons, transitioning to negative correlations at longer horizons; significance is controlled via False Discovery Rate and a mitigated correlation matrix. The results demonstrate cross-market robustness, quantify an optimal horizon $d_{\text{opt}}$ for predictive power, and highlight the practical potential of LLM-driven sentiment indicators for systematic trading insights and risk management. Overall, the work contributes a novel, interpretable sentiment metric built on LLM prompts and validates its predictive efficacy for global equity movements.

Abstract

We used a dataset of daily Bloomberg Financial Market Summaries from 2010 to 2023, reposted on large financial media, to determine how global news headlines may affect stock market movements using ChatGPT and a two-stage prompt approach. We document a statistically significant positive correlation between the sentiment score and future equity market returns over short to medium term, which reverts to a negative correlation over longer horizons. Validation of this correlation pattern across multiple equity markets indicates its robustness across equity regions and resilience to non-linearity, evidenced by comparison of Pearson and Spearman correlations. Finally, we provide an estimate of the optimal horizon that strikes a balance between reactivity to new information and correlation.

Can ChatGPT Compute Trustworthy Sentiment Scores from Bloomberg Market Wraps?

TL;DR

, with the score defined as

and

aggregating over

days. Across major equity markets and horizons, the study finds a robust, positive relationship between cumulative sentiment and forward returns at short-to-mid horizons, transitioning to negative correlations at longer horizons; significance is controlled via False Discovery Rate and a mitigated correlation matrix. The results demonstrate cross-market robustness, quantify an optimal horizon

for predictive power, and highlight the practical potential of LLM-driven sentiment indicators for systematic trading insights and risk management. Overall, the work contributes a novel, interpretable sentiment metric built on LLM prompts and validates its predictive efficacy for global equity movements.

Abstract

Paper Structure (34 sections, 1 theorem, 10 equations, 48 figures, 6 tables)

This paper contains 34 sections, 1 theorem, 10 equations, 48 figures, 6 tables.

Introduction
Related works
Prompt engineering
Data collection
Two-step approach
Global Equities Sentiment Indicator
Evaluation of the Sentiment Score's Validity
Descriptive statistics
The Equity Data and Variable Computation
Correlation Results
T-test on the correlation
False Discovery Rate
T-Test Adaptation
The Mitigated Matrix
The Short Term Correlation
...and 19 more sections

Key Result

Proposition 1

The sentiment score $S$ satisfies some properties:

Figures (48)

Figure 1: Raw signal exhibiting significant noise
Figure 2: Cumulated sentiment score with d=20
Figure 3: Pearson correlation matrix of the cumulative score and the NASDAQ
Figure 4: Adjusted p-value for the Pearson correlation between the US Tech market and the cumulative sentiment score
Figure 5: Mitigated correlation between the US Tech market and the cumulative sentiment score
...and 43 more figures

Theorems & Definitions (3)

Definition 4.1
Proposition 1
Definition 4.2

Can ChatGPT Compute Trustworthy Sentiment Scores from Bloomberg Market Wraps?

TL;DR

Abstract

Can ChatGPT Compute Trustworthy Sentiment Scores from Bloomberg Market Wraps?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (48)

Theorems & Definitions (3)