Numerical Claim Detection in Finance: A New Financial Dataset, Weak-Supervision Model, and Market Analysis
Agam Shah, Arnav Hiray, Pratvi Shah, Arkaprabha Banerjee, Anushka Singh, Dheeraj Eidnani, Sahasra Chava, Bhaskar Chaudhury, Sudheer Chava
TL;DR
This work tackles numerical claim detection in finance by building a labeled English dataset of in-claim and out-of-claim sentences from analyst reports and earnings calls. It introduces a weak-supervision model that incorporates SME-informed aggregation to effectively label vast text via labeling functions, outperforming baselines while offering low latency. A novel optimism measure is derived from in-claim sentences and validated through regressions showing links to earnings surprises and post-earnings abnormal returns, enabling a simple trading-signaling approach. The proposed dataset, models, and code are publicly released, enabling broader research and practical finance applications in claim extraction and market prediction.
Abstract
In this paper, we investigate the influence of claims in analyst reports and earnings calls on financial market returns, considering them as significant quarterly events for publicly traded companies. To facilitate a comprehensive analysis, we construct a new financial dataset for the claim detection task in the financial domain. We benchmark various language models on this dataset and propose a novel weak-supervision model that incorporates the knowledge of subject matter experts (SMEs) in the aggregation function, outperforming existing approaches. We also demonstrate the practical utility of our proposed model by constructing a novel measure of optimism. Here, we observe the dependence of earnings surprise and return on our optimism measure. Our dataset, models, and code are publicly (under CC BY 4.0 license) available on GitHub.
