Table of Contents
Fetching ...

SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations

Taehan Kim, Sangdae Nam

TL;DR

SAE-RNA addresses the interpretability gap in RNA language models by discovering sparse, interpretable concepts within RiNALMo embeddings using an overcomplete SAE trained on token-level representations. The method maps these sparse features to RNA structural elements (stems, hairpins) and ncRNA families, revealing layer-wise progression from diffuse to sparse, type-selective activations. Through bpRNA-90 and RNAcentral evaluations, the work demonstrates that learned concepts align with biologically meaningful motifs and functional groups, offering a bridge between pretrained embeddings and human biology. This approach enables hypothesis generation and potential feature-aware fine-tuning, providing a practical pathway to steer RNA LMs without full model retraining.

Abstract

Deep learning, particularly with the advancement of Large Language Models, has transformed biomolecular modeling, with protein advances (e.g., ESM) inspiring emerging RNA language models such as RiNALMo. Yet how and what these RNA Language Models internally encode about messenger RNA (mRNA) or non-coding RNA (ncRNA) families remains unclear. We present SAE- RNA, interpretability model that analyzes RiNALMo representations and maps them to known human-level biological features. Our work frames RNA interpretability as concept discovery in pretrained embeddings, without end-to-end retraining, and provides practical tools to probe what RNA LMs may encode about ncRNA families. The model can be extended to close comparisons between RNA groups, and supporting hypothesis generation about previously unrecognized relationships.

SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations

TL;DR

SAE-RNA addresses the interpretability gap in RNA language models by discovering sparse, interpretable concepts within RiNALMo embeddings using an overcomplete SAE trained on token-level representations. The method maps these sparse features to RNA structural elements (stems, hairpins) and ncRNA families, revealing layer-wise progression from diffuse to sparse, type-selective activations. Through bpRNA-90 and RNAcentral evaluations, the work demonstrates that learned concepts align with biologically meaningful motifs and functional groups, offering a bridge between pretrained embeddings and human biology. This approach enables hypothesis generation and potential feature-aware fine-tuning, providing a practical pathway to steer RNA LMs without full model retraining.

Abstract

Deep learning, particularly with the advancement of Large Language Models, has transformed biomolecular modeling, with protein advances (e.g., ESM) inspiring emerging RNA language models such as RiNALMo. Yet how and what these RNA Language Models internally encode about messenger RNA (mRNA) or non-coding RNA (ncRNA) families remains unclear. We present SAE- RNA, interpretability model that analyzes RiNALMo representations and maps them to known human-level biological features. Our work frames RNA interpretability as concept discovery in pretrained embeddings, without end-to-end retraining, and provides practical tools to probe what RNA LMs may encode about ncRNA families. The model can be extended to close comparisons between RNA groups, and supporting hypothesis generation about previously unrecognized relationships.

Paper Structure

This paper contains 25 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The Analysis Overview. (1) A sparse autoencoder (SAE) is trained offline on embeddings from the RNACentral dataset with balanced family groups. (2) The trained SAE is then used in an analysis pipeline to extract interpretable features for each single RNA sequences.
  • Figure 2: Number of SAE features per layer that were retained after filtering for activations in at least 10 distinct sequences.
  • Figure 3: Activated sequence of bpRNA-RFAM-25894 and bpRNA-RFAM-42383 at token level by feature 2053: (Stem).
  • Figure 4: Activated sequence of bpRNA-CRW-29143 in token-level by feature 7783: (Hairpin)
  • Figure 5: Union of per-type top-$k$ features for L1 (top), L18 (middle), and L33 (bottom). Color shows normalized activation; the $y$-axis indexes selected feature channels and the $x$-axis enumerates RNA types. L1 shows the lowest sparsity; from L18 onward, patterns are markedly sparser and more type-selective.