Table of Contents
Fetching ...

Bilinear Convolution Decomposition for Causal RL Interpretability

Narmeen Oozeer, Sinem Erisken, Alice Rigg

TL;DR

This work tackles the challenge of causal interpretability in reinforcement learning by replacing nonlinearities in convolutional networks with bilinear variants, yielding models with analytically tractable representations. It introduces bilinear convolution layers ($BConv$) and demonstrates that they train competitively on ProcGen tasks, while enabling a decomposition into eigenfilters and a separation of channel and spatial information via SVD. A protocol is proposed to causally validate concept-based probes, illustrated through a maze-solving agent tracking a cheese object, linking probe mechanics to decision-making. Overall, the approach provides a path toward more interpretable RL systems by connecting weight-based bilinear structure with mechanistic, probe-driven insights, without sacrificing task performance.

Abstract

Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing, which provide only correlational insights and coarse causal control. This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed. We show bilinear model variants perform comparably in model-free reinforcement learning settings, and give a side by side comparison on ProcGen environments. Bilinear layers' analytic structure enables weight-based decomposition. Previous work has shown bilinearity enables quantifying functional importance through eigendecomposition, to identify interpretable low rank structure. We show how to adapt the decomposition to convolution layers by applying singular value decomposition to vectors of interest, to separate the channel and spatial dimensions. Finally, we propose a methodology for causally validating concept-based probes, and illustrate its utility by studying a maze-solving agent's ability to track a cheese object.

Bilinear Convolution Decomposition for Causal RL Interpretability

TL;DR

This work tackles the challenge of causal interpretability in reinforcement learning by replacing nonlinearities in convolutional networks with bilinear variants, yielding models with analytically tractable representations. It introduces bilinear convolution layers () and demonstrates that they train competitively on ProcGen tasks, while enabling a decomposition into eigenfilters and a separation of channel and spatial information via SVD. A protocol is proposed to causally validate concept-based probes, illustrated through a maze-solving agent tracking a cheese object, linking probe mechanics to decision-making. Overall, the approach provides a path toward more interpretable RL systems by connecting weight-based bilinear structure with mechanistic, probe-driven insights, without sacrificing task performance.

Abstract

Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing, which provide only correlational insights and coarse causal control. This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed. We show bilinear model variants perform comparably in model-free reinforcement learning settings, and give a side by side comparison on ProcGen environments. Bilinear layers' analytic structure enables weight-based decomposition. Previous work has shown bilinearity enables quantifying functional importance through eigendecomposition, to identify interpretable low rank structure. We show how to adapt the decomposition to convolution layers by applying singular value decomposition to vectors of interest, to separate the channel and spatial dimensions. Finally, we propose a methodology for causally validating concept-based probes, and illustrate its utility by studying a maze-solving agent's ability to track a cheese object.

Paper Structure

This paper contains 20 sections, 25 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Bimpala: we modified a simplified IMPALA architecture (black) by replacing the operation ReLU(Conv2D) (\ref{['eq:bconv2d']}) with BConv2D which consists of gating $2$ Conv2D blocks. We also swap Relu(FC) with FCBilinear (\ref{['eq:fcbilinear']}). (red)
  • Figure 2: Visualization of the quadratic form derivation for gated bilinear convolutions. The diagram illustrates the transformation from spatial convolution operations to a bilinear matrix form in three stages: (1) The upper path shows the computation of spatial convolutions $U^{(i)}$ with input $X_j$, producing terms $a_j$. (2) The lower path similarly computes convolutions $V^{(i)}$ with input $X_k$, producing terms $b_k$. (3) The right side demonstrates how these operations can be reformulated as a product of three block matrices, where the outer product of channel responses $(U^\top V)$ forms a symmetric bilinear matrix. The diagram emphasizes how local spatial convolutions (shown in the cubes) are transformed into a bilinear form $B$.
  • Figure 3: Activations for the top positive (left) and negative (right) eigenfilters in the second BConv layer, for the cheese probe's top singular channel. Activations for a maze with cheese (top) vs without cheese (bottom). Middle plots show the difference between the activations with and without cheese. While the positive filter activates on non-cheese patterns, the negative filter downweighs non-cheese patterns without erasing the cheese activation.
  • Figure 4: Performance comparison between ReLU and Bilinear architectures across four reinforcement learning environments. Each row represents a different environment (Maze, Heist, Plunder, Dodgeball), while columns show different evaluation metrics (expected return, average entropy, and unexplained variance). The Bilinear architecture (shown in darker colors) generally demonstrates faster learning and higher final performance in terms of expected return, and maintaining lower entropy. Unexplained variance in bilinear models is higher compared to the ReLU baseline (shown in lighter color).
  • Figure 5: Left: Probe singular values. Right: Fraction of variance explained.
  • ...and 5 more figures