Distilled ChatGPT Topic & Sentiment Modeling with Applications in Finance

Olivier Gandouet; Mouloud Belbahri; Armelle Jezequel; Yuriy Bodjov

Distilled ChatGPT Topic & Sentiment Modeling with Applications in Finance

Olivier Gandouet, Mouloud Belbahri, Armelle Jezequel, Yuriy Bodjov

TL;DR

This work addresses the challenge of extracting interpretable signals from vast earnings call transcripts by marrying large-language-model guidance with knowledge distillation to produce compact topic and sentiment classifiers. The authors build a lightweight topic model using MPNet with an MLP head and a sentiment model via a two-stage distillation that leverages a FinBERT teacher and ChatGPT-derived data, achieving competitive performance (approximately $78\%$ F1 on expert data and up to $83\%$ on benchmarks). They validate the approach on S&P 1500 data (2010–2023), showing topic propensity and net sentiment can correlate with sector-neutral returns, though effects are topic-dependent and require careful topic differentiation. The framework enables efficient, deployable analysis suitable for quantitative investing, and the authors propose extensions to handle multi-topic sentences, sentence proximity, and interactive teacher–student refinement to adapt to changing market conditions.

Abstract

In this study, ChatGPT is utilized to create streamlined models that generate easily interpretable features. These features are then used to evaluate financial outcomes from earnings calls. We detail a training approach that merges knowledge distillation and transfer learning, resulting in lightweight topic and sentiment classification models without significant loss in accuracy. These models are assessed through a dataset annotated by experts. The paper also delves into two practical case studies, highlighting how the generated features can be effectively utilized in quantitative investing scenarios.

Distilled ChatGPT Topic & Sentiment Modeling with Applications in Finance

TL;DR

F1 on expert data and up to

on benchmarks). They validate the approach on S&P 1500 data (2010–2023), showing topic propensity and net sentiment can correlate with sector-neutral returns, though effects are topic-dependent and require careful topic differentiation. The framework enables efficient, deployable analysis suitable for quantitative investing, and the authors propose extensions to handle multi-topic sentences, sentence proximity, and interactive teacher–student refinement to adapt to changing market conditions.

Abstract

Paper Structure (13 sections, 3 equations, 8 figures, 4 tables)

This paper contains 13 sections, 3 equations, 8 figures, 4 tables.

Introduction
Knowledge Distillation Pipeline
Earning Calls Data
Identifying Financial Topics
Creating a Labeled Dataset of Sentences
Training a Topic Classification Model
Training a Sentiment Model for "Free"
Benchmark Datasets
Computational Details
Applications in Finance
Correlation between Topic Propensity, Sentiment Score and Sector Neutral Returns
Sentiment Gap between Sales and Earnings
Conclusion

Figures (8)

Figure 1: Earning Calls Topic Classification Pipeline.
Figure 2: Examples of sentences labeled by Chat GPT.
Figure 3: Identified topics distribution and average sentiment per topic on the labeled sentences dataset.
Figure 4: Topic Classification Student Model Architecture.
Figure 5: Sentiment Classification Model Pipeline.
...and 3 more figures

Distilled ChatGPT Topic & Sentiment Modeling with Applications in Finance

TL;DR

Abstract

Distilled ChatGPT Topic & Sentiment Modeling with Applications in Finance

Authors

TL;DR

Abstract

Table of Contents

Figures (8)