Table of Contents
Fetching ...

Sparse Autoencoder Neural Operators: Model Recovery in Function Spaces

Bahareh Tolooshams, Ailsa Shen, Anima Anandkumar

TL;DR

The paper extends sparse model recovery to function spaces by introducing Sparse Autoencoder Neural Operators (SAE-NOs) and Lifted SAE-NOs (L-SAE-NOs), enabling concept learning in infinite-dimensional mappings. It formalizes a sparse functional generative model and uses Fourier-parameterized operators (e.g., SAE-FNO) with lifting to derive preconditioning effects and conditions for architectural-inference equivalence, showing that full-mode operators can emulate standard SAEs while truncated modes favor smooth concepts. The results demonstrate that lifting accelerates learning, increases dictionary orthogonality, and yields robust reconstruction across resolutions, with SAE-FNO achieving superior smooth-concept recovery and resolution generalization. The framework provides a principled path toward mechanistic interpretability of large neural operators and scalable concept discovery in scientific domains.

Abstract

We frame the problem of unifying representations in neural models as one of sparse model recovery and introduce a framework that extends sparse autoencoders (SAEs) to lifted spaces and infinite-dimensional function spaces, enabling mechanistic interpretability of large neural operators (NO). While the Platonic Representation Hypothesis suggests that neural networks converge to similar representations across architectures, the representational properties of neural operators remain underexplored despite their growing importance in scientific computing. We compare the inference and training dynamics of SAEs, lifted-SAE, and SAE neural operators. We highlight how lifting and operator modules introduce beneficial inductive biases, enabling faster recovery, improved recovery of smooth concepts, and robust inference across varying resolutions, a property unique to neural operators.

Sparse Autoencoder Neural Operators: Model Recovery in Function Spaces

TL;DR

The paper extends sparse model recovery to function spaces by introducing Sparse Autoencoder Neural Operators (SAE-NOs) and Lifted SAE-NOs (L-SAE-NOs), enabling concept learning in infinite-dimensional mappings. It formalizes a sparse functional generative model and uses Fourier-parameterized operators (e.g., SAE-FNO) with lifting to derive preconditioning effects and conditions for architectural-inference equivalence, showing that full-mode operators can emulate standard SAEs while truncated modes favor smooth concepts. The results demonstrate that lifting accelerates learning, increases dictionary orthogonality, and yields robust reconstruction across resolutions, with SAE-FNO achieving superior smooth-concept recovery and resolution generalization. The framework provides a principled path toward mechanistic interpretability of large neural operators and scalable concept discovery in scientific domains.

Abstract

We frame the problem of unifying representations in neural models as one of sparse model recovery and introduce a framework that extends sparse autoencoders (SAEs) to lifted spaces and infinite-dimensional function spaces, enabling mechanistic interpretability of large neural operators (NO). While the Platonic Representation Hypothesis suggests that neural networks converge to similar representations across architectures, the representational properties of neural operators remain underexplored despite their growing importance in scientific computing. We compare the inference and training dynamics of SAEs, lifted-SAE, and SAE neural operators. We highlight how lifting and operator modules introduce beneficial inductive biases, enabling faster recovery, improved recovery of smooth concepts, and robust inference across varying resolutions, a property unique to neural operators.

Paper Structure

This paper contains 9 sections, 13 theorems, 60 equations, 7 figures.

Key Result

Proposition 3.1

The training dynamics of the lifted-SAE (L-SAE) ${\bm D}_L^{(k+1)}={\bm D}_L^{(k)} + \eta_L {\bm P}^\top ({\bm x} - {\bm P}{\bm D}_L^{(k)}{\bm z}){\bm z}^\top$, with lifting ${\bm L}$ and projection ${\bm P}$, has the effective update in the original space, expressed as: ${\bm D}^{(k+1)} = {\bm D}^{

Figures (7)

  • Figure 1: Model Recovery with SAEs. a) Architectural comparison of SAE, lifted SAE, and SAE Neural Operators. b) Learning in sampled Euclidean spaces vs. function spaces.
  • Figure 2: SAE-CNN vs. SAE-FNO. a) Lifting accelerates learning. b) SAE-FNO's superiority in recovering smooth concepts via truncated Fourier modes. c) Equivalent learning when SAE-FNO uses all Fourier modes and matched spatial receptive field of SAE-CNNs.
  • Figure 3: SAE-FNO Upsampling Robustness Across Resolutions. SAE-FNO successfully infers the underlying sparse representations and reconstructs data across multiple discretization levels. The left panels show inference of 1-sparse code supports across 5 kernels, and the right panels display spatial-domain signal reconstruction (see also \ref{['fig:upsampling_app']}).
  • Figure 4: Lifting as a preconditioner. Lifting accelerates learning.
  • Figure 5: Lifting. When the lifting operator satisfies the orthogonal condition ${\bm L}^\top {\bm L} = {\bm I}$, the lifted SAE-CNN (L-SAE-CNN) exhibits equivalent learning dynamics to the SAE-CNN .
  • ...and 2 more figures

Theorems & Definitions (33)

  • Definition 1.1: Sparse Generative Models
  • Definition 1.2: Sparse Model Recovery
  • Definition 1.3: Sparse Autoencoders for Model Recovery
  • Definition 1.4: Fourier integral operator $\mathcal{K}$ (restated from li2021fourierkovachki2023neural)
  • Definition 2.1: Sparse Functional Generative Model
  • Definition 2.2: Sparse Functional Model Recovery
  • Definition 2.3: Sparse Autoencoder Neural Operators (SAE-NOs) for Model Recovery
  • Definition 2.4: Lifted SAE-NO for Model Recovery
  • Proposition 3.1: Training Dynamics of Lifting
  • Proposition 3.2: Architectural Inference Equivalence of Lifting in SAE
  • ...and 23 more