SAE-FiRE: Enhancing Earnings Surprise Predictions Through Sparse Autoencoder Feature Selection
Huopu Zhang, Yanguang Liu, Miao Zhang, Zirui He, Mengnan Du
TL;DR
This work tackles the challenge of predicting earnings surprises from long, noisy financial documents by introducing SAE-FiRE, a framework that leverages Sparse Autoencoders to extract sparse, interpretable representations from frozen LLM residual activations. It combines two feature-selection strategies—ANOVA F-tests and tree-based importance—to identify the most discriminative SAE dimensions before training a logistic regression classifier, achieving robust performance across three diverse financial datasets. The approach outperforms strong baselines, including zero-/few-shot prompting and long-document models, and offers interpretable insights by mapping top SAE features to human-readable concepts. The results suggest that targeted, noise-filtered latent features can enhance generalization in financial text analytics and point to future extensions into multimodal and cross-lingual tasks.
Abstract
Predicting earnings surprises from financial documents, such as earnings conference calls, regulatory filings, and financial news, has become increasingly important in financial economics. However, these financial documents present significant analytical challenges, typically containing over 5,000 words with substantial redundancy and industry-specific terminology that creates obstacles for language models. In this work, we propose the SAE-FiRE (Sparse Autoencoder for Financial Representation Enhancement) framework to address these limitations by extracting key information while eliminating redundancy. SAE-FiRE employs Sparse Autoencoders (SAEs) to decompose dense neural representations from large language models into interpretable sparse components, then applies statistical feature selection methods, including ANOVA F-tests and tree-based importance scoring, to identify the top-k most discriminative dimensions for classification. By systematically filtering out noise that might otherwise lead to overfitting, we enable more robust and generalizable predictions. Experimental results across three financial datasets demonstrate that SAE-FiRE significantly outperforms baseline approaches.
