Table of Contents
Fetching ...

Towards Understanding What State Space Models Learn About Code

Jiali Wu, Abhinav Anand, Shweta Verma, Mira Mezini

TL;DR

The paper investigates what State Space Models (SSMs) learn about code by conducting the first systematic comparison with Transformer-based models and introducing a frequency-domain kernel analysis (SSM-Interpret). It finds that CodeSSM better captures code syntax and semantics before fine-tuning, but can forget key relations for short-range dependencies during type-inference tasks, due to a spectral shift toward high-frequency information in early layers. Guided by these diagnostics, the authors propose architectural refinements, including a parallel high-frequency CNN path, which yield consistent improvements across NLCodeSearch, Long Context Retrieval, and Type Inference. The work also establishes SSM-Interpret as a general tool for dissecting SSM kernels and demonstrates that interpretability-driven changes can directly translate into better code-understanding models with favorable compute and data efficiency.

Abstract

State Space Models (SSMs) have emerged as an efficient alternative to the transformer architecture. Recent studies show that SSMs can match or surpass Transformers on code understanding tasks, such as code retrieval, when trained under similar conditions. However, their internal mechanisms remain a black box. We present the first systematic analysis of what SSM-based code models actually learn and perform the first comparative analysis of SSM and Transformer-based code models. Our analysis reveals that SSMs outperform Transformers at capturing code syntax and semantics in pretraining but forgets certain syntactic and semantic relations during fine-tuning on task, especially when the task emphasizes short-range dependencies. To diagnose this, we introduce SSM-Interpret, a frequency-domain framework that exposes a spectral shift toward short-range dependencies during fine-tuning. Guided by these findings, we propose architectural modifications that significantly improve the performance of SSM-based code model, validating that our analysis directly enables better models.

Towards Understanding What State Space Models Learn About Code

TL;DR

The paper investigates what State Space Models (SSMs) learn about code by conducting the first systematic comparison with Transformer-based models and introducing a frequency-domain kernel analysis (SSM-Interpret). It finds that CodeSSM better captures code syntax and semantics before fine-tuning, but can forget key relations for short-range dependencies during type-inference tasks, due to a spectral shift toward high-frequency information in early layers. Guided by these diagnostics, the authors propose architectural refinements, including a parallel high-frequency CNN path, which yield consistent improvements across NLCodeSearch, Long Context Retrieval, and Type Inference. The work also establishes SSM-Interpret as a general tool for dissecting SSM kernels and demonstrates that interpretability-driven changes can directly translate into better code-understanding models with favorable compute and data efficiency.

Abstract

State Space Models (SSMs) have emerged as an efficient alternative to the transformer architecture. Recent studies show that SSMs can match or surpass Transformers on code understanding tasks, such as code retrieval, when trained under similar conditions. However, their internal mechanisms remain a black box. We present the first systematic analysis of what SSM-based code models actually learn and perform the first comparative analysis of SSM and Transformer-based code models. Our analysis reveals that SSMs outperform Transformers at capturing code syntax and semantics in pretraining but forgets certain syntactic and semantic relations during fine-tuning on task, especially when the task emphasizes short-range dependencies. To diagnose this, we introduce SSM-Interpret, a frequency-domain framework that exposes a spectral shift toward short-range dependencies during fine-tuning. Guided by these findings, we propose architectural modifications that significantly improve the performance of SSM-based code model, validating that our analysis directly enables better models.
Paper Structure (22 sections, 4 equations, 12 figures, 2 tables)

This paper contains 22 sections, 4 equations, 12 figures, 2 tables.

Figures (12)

  • Figure 1: The CodeSSM layer architecture (left) showing the original routing mechanism (center) and the proposed routing (right).
  • Figure 2: Comparison of hidden representation of CodeSSM and RoCoder on distance (left), siblings (center) and edge (right) prediction tasks.
  • Figure 3: Result of hidden representation analysis for CodeSSM and RoCoder along with their fine-tuned versions in terms of mean accuracy across task labels. The left row shows the performance on distance prediction task, center shows performance on sibling prediction task and right shows it on edge prediction task. Layer 10 and 11 of RoCoder is missing because the clustering algorithm does not converge.
  • Figure 4: Accuracy of CodeSSM, CodeSSM-typeinf, Rocoder and Rocoder-typeinf on distance prediction tasks for layers 6 and 10.
  • Figure 5: Layer wise filter classification of forward (left) and backward kernel (right) of CodeSSM and its variants.
  • ...and 7 more figures