Sparse Autoencoders are Topic Models
Leander Girrbach, Zeynep Akata
TL;DR
Sparse autoencoders (SAEs) are recast as topic models by extending Latent Dirichlet Allocation (LDA) to embedding spaces and interpreting SAE features as thematic topic atoms under a continuous-topic model. The authors derive the SAE objective as a MAP estimator within this CTM, enabling a framework (SAE-TM) that pretrains foundational SAEs to learn reusable topic directions, interprets those directions as word distributions on downstream data, and merges them into any desired number of topics without retraining. Empirical results across five text datasets and three image datasets show SAE-TM achieving superior topic coherence (often at scale) while maintaining reasonable diversity, and enabling downstream analyses such as cross-dataset thematic comparisons and historical art trend studies (e.g., Japanese woodblock prints). The approach offers a scalable, modality-agnostic toolkit for large-scale thematic analysis, with practical implications for dataset understanding and multimodal interpretation; code and data are slated for release.
Abstract
Sparse autoencoders (SAEs) are used to analyze embeddings, but their role and practical value are debated. We propose a new perspective on SAEs by demonstrating that they can be naturally understood as topic models. We extend Latent Dirichlet Allocation to embedding spaces and derive the SAE objective as a maximum a posteriori estimator under this model. This view implies SAE features are thematic components rather than steerable directions. Based on this, we introduce SAE-TM, a topic modeling framework that: (1) trains an SAE to learn reusable topic atoms, (2) interprets them as word distributions on downstream data, and (3) merges them into any number of topics without retraining. SAE-TM yields more coherent topics than strong baselines on text and image datasets while maintaining diversity. Finally, we analyze thematic structure in image datasets and trace topic changes over time in Japanese woodblock prints. Our work positions SAEs as effective tools for large-scale thematic analysis across modalities. Code and data will be released upon publication.
