Dopamine: Brain Modes, Not Brains
Shervin Ghasemlou
TL;DR
The paper tackles interpretability and efficiency in parameter-efficient fine-tuning by shifting adaptation from weight-space deltas to activation-space gating. It introduces TauGate, which freezes base weights and learns per-neuron thresholds and gains to gate neuron participation, enabling explicit conditional computation. In a MNIST mode-specialization setup (0° vs 45°), TauGate achieves rotated-mode accuracy improvements with a small parameter budget and reveals sparsity in activated units, while providing interpretable neuron-level attributions; it also positions TauGate relative to bias tuning and IA^3, and contrasts with LoRA in parameter efficiency. Limitations include reduced expressivity when the frozen base lacks needed features and challenges in scaling to large transformers, with future work aimed at context-conditioned thresholds and practical speedups.
Abstract
Parameter-efficient fine-tuning (PEFT) methods such as \lora{} adapt large pretrained models by adding small weight-space updates. While effective, weight deltas are hard to interpret mechanistically, and they do not directly expose \emph{which} internal computations are reused versus bypassed for a new task. We explore an alternative view inspired by neuromodulation: adaptation as a change in \emph{mode} -- selecting and rescaling existing computations -- rather than rewriting the underlying weights. We propose \methodname{}, a simple activation-space PEFT technique that freezes base weights and learns per-neuron \emph{thresholds} and \emph{gains}. During training, a smooth gate decides whether a neuron's activation participates; at inference the gate can be hardened to yield explicit conditional computation and neuron-level attributions. As a proof of concept, we study ``mode specialization'' on MNIST (0$^\circ$) versus rotated MNIST (45$^\circ$). We pretrain a small MLP on a 50/50 mixture (foundation), freeze its weights, and then specialize to the rotated mode using \methodname{}. Across seeds, \methodname{} improves rotated accuracy over the frozen baseline while using only a few hundred trainable parameters per layer, and exhibits partial activation sparsity (a minority of units strongly active). Compared to \lora{}, \methodname{} trades some accuracy for substantially fewer trainable parameters and a more interpretable ``which-neurons-fire'' mechanism. We discuss limitations, including reduced expressivity when the frozen base lacks features needed for the target mode.
