Extracting the Structure of Press Releases for Predicting Earnings Announcement Returns
Yuntao Wu, Ege Mert Akin, Charles Martineau, Vincent Grégoire, Andreas Veneris
TL;DR
The paper addresses how soft information in earnings press releases contributes to price formation on earnings announcement days, comparing it to traditional hard information from earnings surprises. Using a large dataset (over 138k press releases from 2005–2023) and a mix of vectorization techniques (BKMX/LDA, online LDA, and BERT variants including FinBERT), the authors build real-time return forecasts via a rolling-window Lasso and assess explanatory power with cross-sectional regressions and SHAP values. They find soft information is as informative as earnings surprises, with FinBERT delivering the strongest predictive power and SHAP-based importance indicating meaningful, interpretable signals; combining soft and hard signals increases explanatory power and reveals self-serving biases in managerial narratives. The results imply that prices fully reflect soft information by market open, supporting market efficiency, though information leakage prior to releases can yield predictive gains, motivating future work that integrates conference-call content and cyber-security considerations to better understand information flow and price formation.
Abstract
We examine how textual features in earnings press releases predict stock returns on earnings announcement days. Using over 138,000 press releases from 2005 to 2023, we compare traditional bag-of-words and BERT-based embeddings. We find that press release content (soft information) is as informative as earnings surprise (hard information), with FinBERT yielding the highest predictive power. Combining models enhances explanatory strength and interpretability of the content of press releases. Stock prices fully reflect the content of press releases at market open. If press releases are leaked, it offers predictive advantage. Topic analysis reveals self-serving bias in managerial narratives. Our framework supports real-time return prediction through the integration of online learning, provides interpretability and reveals the nuanced role of language in price formation.
