Table of Contents
Fetching ...

Dopamine: Brain Modes, Not Brains

Shervin Ghasemlou

TL;DR

The paper tackles interpretability and efficiency in parameter-efficient fine-tuning by shifting adaptation from weight-space deltas to activation-space gating. It introduces TauGate, which freezes base weights and learns per-neuron thresholds and gains to gate neuron participation, enabling explicit conditional computation. In a MNIST mode-specialization setup (0° vs 45°), TauGate achieves rotated-mode accuracy improvements with a small parameter budget and reveals sparsity in activated units, while providing interpretable neuron-level attributions; it also positions TauGate relative to bias tuning and IA^3, and contrasts with LoRA in parameter efficiency. Limitations include reduced expressivity when the frozen base lacks needed features and challenges in scaling to large transformers, with future work aimed at context-conditioned thresholds and practical speedups.

Abstract

Parameter-efficient fine-tuning (PEFT) methods such as \lora{} adapt large pretrained models by adding small weight-space updates. While effective, weight deltas are hard to interpret mechanistically, and they do not directly expose \emph{which} internal computations are reused versus bypassed for a new task. We explore an alternative view inspired by neuromodulation: adaptation as a change in \emph{mode} -- selecting and rescaling existing computations -- rather than rewriting the underlying weights. We propose \methodname{}, a simple activation-space PEFT technique that freezes base weights and learns per-neuron \emph{thresholds} and \emph{gains}. During training, a smooth gate decides whether a neuron's activation participates; at inference the gate can be hardened to yield explicit conditional computation and neuron-level attributions. As a proof of concept, we study ``mode specialization'' on MNIST (0$^\circ$) versus rotated MNIST (45$^\circ$). We pretrain a small MLP on a 50/50 mixture (foundation), freeze its weights, and then specialize to the rotated mode using \methodname{}. Across seeds, \methodname{} improves rotated accuracy over the frozen baseline while using only a few hundred trainable parameters per layer, and exhibits partial activation sparsity (a minority of units strongly active). Compared to \lora{}, \methodname{} trades some accuracy for substantially fewer trainable parameters and a more interpretable ``which-neurons-fire'' mechanism. We discuss limitations, including reduced expressivity when the frozen base lacks features needed for the target mode.

Dopamine: Brain Modes, Not Brains

TL;DR

The paper tackles interpretability and efficiency in parameter-efficient fine-tuning by shifting adaptation from weight-space deltas to activation-space gating. It introduces TauGate, which freezes base weights and learns per-neuron thresholds and gains to gate neuron participation, enabling explicit conditional computation. In a MNIST mode-specialization setup (0° vs 45°), TauGate achieves rotated-mode accuracy improvements with a small parameter budget and reveals sparsity in activated units, while providing interpretable neuron-level attributions; it also positions TauGate relative to bias tuning and IA^3, and contrasts with LoRA in parameter efficiency. Limitations include reduced expressivity when the frozen base lacks needed features and challenges in scaling to large transformers, with future work aimed at context-conditioned thresholds and practical speedups.

Abstract

Parameter-efficient fine-tuning (PEFT) methods such as \lora{} adapt large pretrained models by adding small weight-space updates. While effective, weight deltas are hard to interpret mechanistically, and they do not directly expose \emph{which} internal computations are reused versus bypassed for a new task. We explore an alternative view inspired by neuromodulation: adaptation as a change in \emph{mode} -- selecting and rescaling existing computations -- rather than rewriting the underlying weights. We propose \methodname{}, a simple activation-space PEFT technique that freezes base weights and learns per-neuron \emph{thresholds} and \emph{gains}. During training, a smooth gate decides whether a neuron's activation participates; at inference the gate can be hardened to yield explicit conditional computation and neuron-level attributions. As a proof of concept, we study ``mode specialization'' on MNIST (0) versus rotated MNIST (45). We pretrain a small MLP on a 50/50 mixture (foundation), freeze its weights, and then specialize to the rotated mode using \methodname{}. Across seeds, \methodname{} improves rotated accuracy over the frozen baseline while using only a few hundred trainable parameters per layer, and exhibits partial activation sparsity (a minority of units strongly active). Compared to \lora{}, \methodname{} trades some accuracy for substantially fewer trainable parameters and a more interpretable ``which-neurons-fire'' mechanism. We discuss limitations, including reduced expressivity when the frozen base lacks features needed for the target mode.
Paper Structure (36 sections, 5 equations, 1 figure, 3 tables)

This paper contains 36 sections, 5 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Test accuracy on MNIST (0$^\circ$) and rotated MNIST (45$^\circ$) after specializing to the rotated mode. TauGate improves over the frozen foundation baseline with a small parameter budget and exhibits partial activation sparsity ("High-act frac" in Table \ref{['tab:mnist-rotation']}).