Table of Contents
Fetching ...

Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

Rio Alexa Fear, Payel Mukhopadhyay, Michael McCabe, Alberto Bietti, Miles Cranmer

TL;DR

This work investigates whether a physics-focused foundation model learns interpretable, transferable internal representations analogous to those found in LLMs. By computing delta activation directions between contrasting physical regimes and injecting them during inference, the authors demonstrate causal control over simulated physical phenomena, including vorticity, diffusion, and temporal dynamics. They show these concept directions can be transferred across diverse physical systems, suggesting domain-general abstractions such as rotation or spiralling. The findings support the view that scientific foundation models can encode abstract physical principles and offer tools for counterfactual exploration, error correction, and auditing in AI-assisted discovery.

Abstract

Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct, human-understandable abstract concepts and behaviour. Moreover, these hidden features can be directly manipulated to steer model behaviour. However, it remains an open question whether this phenomenon is unique to models trained on inherently structured data (ie. language, images) or if it is a general property of foundation models. In this work, we investigate the internal representations of a large physics-focused foundation model. Inspired by recent work identifying single directions in activation space for complex behaviours in LLMs, we extract activation vectors from the model during forward passes over simulation datasets for different physical regimes. We then compute "delta" representations between the two regimes. These delta tensors act as concept directions in activation space, encoding specific physical features. By injecting these concept directions back into the model during inference, we can steer its predictions, demonstrating causal control over physical behaviours, such as inducing or removing some particular physical feature from a simulation. These results suggest that scientific foundation models learn generalised representations of physical principles. They do not merely rely on superficial correlations and patterns in the simulations. Our findings open new avenues for understanding and controlling scientific foundation models and has implications for AI-enabled scientific discovery.

Physics Steering: Causal Control of Cross-Domain Concepts in a Physics Foundation Model

TL;DR

This work investigates whether a physics-focused foundation model learns interpretable, transferable internal representations analogous to those found in LLMs. By computing delta activation directions between contrasting physical regimes and injecting them during inference, the authors demonstrate causal control over simulated physical phenomena, including vorticity, diffusion, and temporal dynamics. They show these concept directions can be transferred across diverse physical systems, suggesting domain-general abstractions such as rotation or spiralling. The findings support the view that scientific foundation models can encode abstract physical principles and offer tools for counterfactual exploration, error correction, and auditing in AI-assisted discovery.

Abstract

Recent advances in mechanistic interpretability have revealed that large language models (LLMs) develop internal representations corresponding not only to concrete entities but also distinct, human-understandable abstract concepts and behaviour. Moreover, these hidden features can be directly manipulated to steer model behaviour. However, it remains an open question whether this phenomenon is unique to models trained on inherently structured data (ie. language, images) or if it is a general property of foundation models. In this work, we investigate the internal representations of a large physics-focused foundation model. Inspired by recent work identifying single directions in activation space for complex behaviours in LLMs, we extract activation vectors from the model during forward passes over simulation datasets for different physical regimes. We then compute "delta" representations between the two regimes. These delta tensors act as concept directions in activation space, encoding specific physical features. By injecting these concept directions back into the model during inference, we can steer its predictions, demonstrating causal control over physical behaviours, such as inducing or removing some particular physical feature from a simulation. These results suggest that scientific foundation models learn generalised representations of physical principles. They do not merely rely on superficial correlations and patterns in the simulations. Our findings open new avenues for understanding and controlling scientific foundation models and has implications for AI-enabled scientific discovery.

Paper Structure

This paper contains 33 sections, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Schematic illustration of methodology. Activations are first extracted from the physics model during forward passes over input segments that exhibit physical feature $f$, yielding activations $\boldsymbol{\mu}_{f,i}$, and from segments lacking the feature, ${\neg}f$, yielding $\boldsymbol{\nu}_{f,i}$. The difference between these activations, $\mathbf{\Delta}_f$, is then injected back into Walrus during inference to steer future results.
  • Figure 2: Negative $\mathbf{\Delta}_{\text{vortex}}$ injection into shear flow vortex regime, for $\alpha$ values of 0, 0.3, 0.5 and 1.0. Frame: 64.
  • Figure 3: Positive $\mathbf{\Delta}_{\text{vortex}}$ injection into shear flow laminar regime, for $\alpha$ values of 0, 0.2, 0.3 and 0.4. Frame: 64.
  • Figure 4: On the left tracer fields for $\mathbf{\Delta}_{\text{diffusion}}$ injection into Shear Flow vortex regime with (top) $\alpha=0.1$ and (bottom) $\alpha=-0.1$. On the right tracer fields for $\mathbf{\Delta}_{\text{speed}}$ injection into Shear Flow vortex regime with (top) $\alpha=\mathit{0.1}$ and (bottom) $\alpha=\mathit{-0.1}$. Frame(left): 30, Frame(right): 24.
  • Figure 5: Transfer of $\mathbf{\Delta}_{\text{vortex}}$ concept injection to Rayleigh-Bénard simulations. Pressure and buoyancy fields for (top) averaging over spatial dimensions: (left) $\alpha=-0.1$, (centre) $\alpha=0.0$, (right) $\alpha=0.1$; (bottom) including spatial dimensions (no averaging): (left) $\alpha=-0.1$, (centre) $\alpha=0.0$, (right) $\alpha=0.1$. Frame(top): 40, Frame(bottom): 50.
  • ...and 4 more figures