Table of Contents
Fetching ...

Protein Circuit Tracing via Cross-layer Transcoders

Darin Tsui, Kunal Talreja, Daniel Saeedi, Amirali Aghazadeh

TL;DR

ProtoMech introduces cross-layer transcoders to capture the full computational circuitry of protein language models, providing a faithful replacement model that links multi-layer transformations. It demonstrates that compact, sparse circuits can recover most of the original model’s performance and that steering along these circuits can design high-fitness protein variants. The framework also reveals alignment with known motifs (e.g., kinase HRD, Rossmann folds) and supports interactive visualization for biological interpretation. Together, ProtoMech offers a principled, scalable approach to circuit tracing in pLMs with practical implications for protein design and interpretability.

Abstract

Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understood. Recent mechanistic interpretability methods decompose pLM representations into interpretable features, but they treat each layer independently and thus fail to capture cross-layer computation, limiting their ability to approximate the full model. We introduce ProtoMech, a framework for discovering computational circuits in pLMs using cross-layer transcoders that learn sparse latent representations jointly across layers to capture the model's full computational circuitry. Applied to the pLM ESM2, ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy, revealing correspondence with structural and functional motifs, including binding, signaling, and stability. Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases. These results establish ProtoMech as a principled framework for protein circuit tracing.

Protein Circuit Tracing via Cross-layer Transcoders

TL;DR

ProtoMech introduces cross-layer transcoders to capture the full computational circuitry of protein language models, providing a faithful replacement model that links multi-layer transformations. It demonstrates that compact, sparse circuits can recover most of the original model’s performance and that steering along these circuits can design high-fitness protein variants. The framework also reveals alignment with known motifs (e.g., kinase HRD, Rossmann folds) and supports interactive visualization for biological interpretation. Together, ProtoMech offers a principled, scalable approach to circuit tracing in pLMs with practical implications for protein design and interpretability.

Abstract

Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understood. Recent mechanistic interpretability methods decompose pLM representations into interpretable features, but they treat each layer independently and thus fail to capture cross-layer computation, limiting their ability to approximate the full model. We introduce ProtoMech, a framework for discovering computational circuits in pLMs using cross-layer transcoders that learn sparse latent representations jointly across layers to capture the model's full computational circuitry. Applied to the pLM ESM2, ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while retaining up to 79% of model accuracy, revealing correspondence with structural and functional motifs, including binding, signaling, and stability. Steering along these circuits enables high-fitness protein design, surpassing baseline methods in more than 70% of cases. These results establish ProtoMech as a principled framework for protein circuit tracing.
Paper Structure (35 sections, 15 equations, 15 figures, 5 tables)

This paper contains 35 sections, 15 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: ProtoMech serves as a replacement model for ESM2.a, Schematic of the circuit discovery process. ProtoMech identifies a circuit of interpretable latents (blue) that traces and approximates the behavior of ESM2 on downstream tasks. b, Example of top activating sequences in Swiss-Prot for a specific latent (L3/1918), which detects the conserved HRD catalytic motif found in protein kinases. On c, protein family classification and d, function prediction downstream tasks, ProtoMech outperforms PLT baselines.
  • Figure 2: Overview of ProtoMech.a, Cross-layer transcoders (CLTs) form a replacement model that captures inter-layer computation by predicting each layer’s output from the sparse latent features of all preceding layers. b, Steering along identified circuits enables the design of protein variants with enhanced functional properties. c, Using the ProtoMech visualizer, we expose overlapping biological motifs previously hidden in ESM2.
  • Figure 3: Examples of family circuits discovered using ProtoMech. We use the ProtoMech visualization tool to examine a, protein kinase domain and b, NADP+ binding domain circuits. We find interpretable features related to binding and active sites, secondary structure, and biochemical patterns. We observe that earlier layers are detecting key amino acids that assemble into complex motifs.
  • Figure 4: Examples of the mutation sensitivity of function circuits discovered using ProtoMech.a, We feed in the wildtype sequence of GB1 into ProtoMech and identify latents related to binding affinity and stability. b, A high-fitness variant of GB1 activates an additional latent that corresponds to protein stability. c, Conversely, a low-fitness variant of GB1 deactivates parts of the circuit related to binding affinity and stability. These findings highlight ProtoMech's ability to provide a mechanistic rationale for changes in fitness to sequences.
  • Figure 5: Performance of all replacement models on a, protein family classification and b, function prediction tasks.
  • ...and 10 more figures