Towards Understanding What State Space Models Learn About Code
Jiali Wu, Abhinav Anand, Shweta Verma, Mira Mezini
TL;DR
The paper investigates what State Space Models (SSMs) learn about code by conducting the first systematic comparison with Transformer-based models and introducing a frequency-domain kernel analysis (SSM-Interpret). It finds that CodeSSM better captures code syntax and semantics before fine-tuning, but can forget key relations for short-range dependencies during type-inference tasks, due to a spectral shift toward high-frequency information in early layers. Guided by these diagnostics, the authors propose architectural refinements, including a parallel high-frequency CNN path, which yield consistent improvements across NLCodeSearch, Long Context Retrieval, and Type Inference. The work also establishes SSM-Interpret as a general tool for dissecting SSM kernels and demonstrates that interpretability-driven changes can directly translate into better code-understanding models with favorable compute and data efficiency.
Abstract
State Space Models (SSMs) have emerged as an efficient alternative to the transformer architecture. Recent studies show that SSMs can match or surpass Transformers on code understanding tasks, such as code retrieval, when trained under similar conditions. However, their internal mechanisms remain a black box. We present the first systematic analysis of what SSM-based code models actually learn and perform the first comparative analysis of SSM and Transformer-based code models. Our analysis reveals that SSMs outperform Transformers at capturing code syntax and semantics in pretraining but forgets certain syntactic and semantic relations during fine-tuning on task, especially when the task emphasizes short-range dependencies. To diagnose this, we introduce SSM-Interpret, a frequency-domain framework that exposes a spectral shift toward short-range dependencies during fine-tuning. Guided by these findings, we propose architectural modifications that significantly improve the performance of SSM-based code model, validating that our analysis directly enables better models.
