Table of Contents
Fetching ...

Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders

Samuel Stevens, Jacob Beattie, Tanya Berger-Wolf, Yu Su

TL;DR

This work tackles the bottleneck of scientific discovery with foundation models by proposing sparse autoencoders (SAEs) to extract open-ended, interpretable feature vocabularies from unlabeled foundation-model activations. By applying SAEs to DINOv3 ViT representations and evaluating on ADE20K and FishVista, the authors demonstrate that SAEs can rediscover semantic concepts and surface fine-grained anatomical structures without supervision, with Matryoshka SAEs offering improved concept coverage. The approach is shown to be domain-agnostic, suggesting applicability to proteins, genomics, weather, and more, and highlights a shift from confirmation-focused analyses to hypothesis-generating discovery. The findings provide a practical pathway to quantify and interrogate what large scientific foundation models have learned, enabling reproducible, open-ended exploration and potential new scientific hypotheses.

Abstract

Scientific archives now contain hundreds of petabytes of data across genomics, ecology, climate, and molecular biology that could reveal undiscovered patterns if systematically analyzed at scale. Large-scale, weakly-supervised datasets in language and vision have driven the development of foundation models whose internal representations encode structure (patterns, co-occurrences and statistical regularities) beyond their training objectives. Most existing methods extract structure only for pre-specified targets; they excel at confirmation but do not support open-ended discovery of unknown patterns. We ask whether sparse autoencoders (SAEs) can enable open-ended feature discovery from foundation model representations. We evaluate this question in controlled rediscovery studies, where the learned SAE features are tested for alignment with semantic concepts on a standard segmentation benchmark and compared against strong label-free alternatives on concept-alignment metrics. Applied to ecological imagery, the same procedure surfaces fine-grained anatomical structure without access to segmentation or part labels, providing a scientific case study with ground-truth validation. While our experiments focus on vision with an ecology case study, the method is domain-agnostic and applicable to models in other sciences (e.g., proteins, genomics, weather). Our results indicate that sparse decomposition provides a practical instrument for exploring what scientific foundation models have learned, an important prerequisite for moving from confirmation to genuine discovery.

Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders

TL;DR

This work tackles the bottleneck of scientific discovery with foundation models by proposing sparse autoencoders (SAEs) to extract open-ended, interpretable feature vocabularies from unlabeled foundation-model activations. By applying SAEs to DINOv3 ViT representations and evaluating on ADE20K and FishVista, the authors demonstrate that SAEs can rediscover semantic concepts and surface fine-grained anatomical structures without supervision, with Matryoshka SAEs offering improved concept coverage. The approach is shown to be domain-agnostic, suggesting applicability to proteins, genomics, weather, and more, and highlights a shift from confirmation-focused analyses to hypothesis-generating discovery. The findings provide a practical pathway to quantify and interrogate what large scientific foundation models have learned, enabling reproducible, open-ended exploration and potential new scientific hypotheses.

Abstract

Scientific archives now contain hundreds of petabytes of data across genomics, ecology, climate, and molecular biology that could reveal undiscovered patterns if systematically analyzed at scale. Large-scale, weakly-supervised datasets in language and vision have driven the development of foundation models whose internal representations encode structure (patterns, co-occurrences and statistical regularities) beyond their training objectives. Most existing methods extract structure only for pre-specified targets; they excel at confirmation but do not support open-ended discovery of unknown patterns. We ask whether sparse autoencoders (SAEs) can enable open-ended feature discovery from foundation model representations. We evaluate this question in controlled rediscovery studies, where the learned SAE features are tested for alignment with semantic concepts on a standard segmentation benchmark and compared against strong label-free alternatives on concept-alignment metrics. Applied to ecological imagery, the same procedure surfaces fine-grained anatomical structure without access to segmentation or part labels, providing a scientific case study with ground-truth validation. While our experiments focus on vision with an ecology case study, the method is domain-agnostic and applicable to models in other sciences (e.g., proteins, genomics, weather). Our results indicate that sparse decomposition provides a practical instrument for exploring what scientific foundation models have learned, an important prerequisite for moving from confirmation to genuine discovery.

Paper Structure

This paper contains 39 sections, 5 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Validating an instrument for open-ended feature discovery.Left: Typical use of a foundation model in science: an input image is passed through a pretrained encoder and a task-specific head to yield class scores; the dense representation remains an opaque embedding, so unnamed factors are inaccessible. Right: Our procedure composes a foundation model with a sparse autoencoder, producing a library of interpretable semantic concepts with per-example activation maps. We find that these concepts align with and localize anatomical parts (e.g., head, dorsal fin) without seeing part-of-body labels. Fish images are shown as a case study with known anatomy providing a controlled rediscovery test; the explored method is domain-agnostic.
  • Figure 2: We compare $k$-means, PCA, vanilla SAEs and Matryoshka SAEs along the reconstruction--sparsity tradeoff for learning to decompose ViT patch activations. We use final layer activations from DINOv3 ViT-L/16 on ImageNet-1K; we fit all methods on the training split and measure normalized MSE (see \ref{['eq:nmse']}) and L$_0$ on the validation split. For $k$-means, we "reconstruct" every patch with its nearest centroid; thus, L$_0$ is always $1$. For PCA, we sweep the number of components. For both SAE variants, we sweep $\lambda$ and show the Pareto frontier of reconstruction--sparsity. Takeaway: Reconstruction--sparsity does not indicate an optimal method.
  • Figure 3: Top images for lowest-loss probes for "person" (left) and "toilet" (right) classes for each method. (a):$k$-means does not recover the "person" class; the best "person" cluster fires on roads. (b): PCA learns a "person" component, but does not consistently activate on the entire person. Furthermore, PCA does not recover the "toilet" class; the best "toilet" doesn't have an obvious semantic concept. (c) & (d): Vanilla and Matryoshka SAEs both reliably recover both "person" and "toilet" and activate on the entire object. Takeaway: While $k$-means and PCA are good by dictionary learning metrics, visual concepts recovered by both vanilla and Matryoshka SAEs are more consistent and salient.
  • Figure 4: Example Matryoshka SAE features (left) and patch-level segmentation masks (right) for (a) "Head", (b) "Dorsal Fin" and (c) "Eye" from DINOv3 ViT-L/16. Features were picked by minimizing cross entropy on binary classification for each body part, as described in \ref{['sec:fishvista', 'eq:probe-loss']}. Takeaway: Matryoshka SAEs accurately learn scientific concepts (anatomical body parts) without labels.
  • Figure 5: Prevalence vs rediscovery on FishVista. Each point is a body part; x-axis: number of patch-level samples in the validation split, y-axis: AP of the best-matching (lowest probe loss) Matryoshka SAE latent. More common body parts are rediscovered more reliably. Takeaway: Higher concept prevalence improves label-free rediscovery.
  • ...and 5 more figures