Table of Contents
Fetching ...

Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

Jiashun Liu, Zihao Wu, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Ling Pan

TL;DR

This paper addresses neuronal activity degradation in deep reinforcement learning and shows that activation-based dormancy metrics lose power in modern architectures. It proposes GraMa, a gradient-magnitude based metric, to quantify neuron-level learning capacity, and ReGraMa, a gradient-guided reset mechanism, which consistently restores activity across residual, diffusion-based, and MLP policies. Through experiments on BRO-net, DACER diffusion policies, and SAC variants across MuJoCo and DeepMind Control Suite tasks, GraMa-based resets yield improved learning stability and performance, particularly as model scale increases. The approach offers a lightweight, architecture-agnostic tool for maintaining continual learning ability in deep RL, with potential to enhance generalization and adaptation in complex agents.

Abstract

Deep reinforcement learning (RL) agents frequently suffer from neuronal activity loss, which impairs their ability to adapt to new data and learn continually. A common method to quantify and address this issue is the tau-dormant neuron ratio, which uses activation statistics to measure the expressive ability of neurons. While effective for simple MLP-based agents, this approach loses statistical power in more complex architectures. To address this, we argue that in advanced RL agents, maintaining a neuron's learning capacity, its ability to adapt via gradient updates, is more critical than preserving its expressive ability. Based on this insight, we shift the statistical objective from activations to gradients, and introduce GraMa (Gradient Magnitude Neural Activity Metric), a lightweight, architecture-agnostic metric for quantifying neuron-level learning capacity. We show that GraMa effectively reveals persistent neuron inactivity across diverse architectures, including residual networks, diffusion models, and agents with varied activation functions. Moreover, resetting neurons guided by GraMa (ReGraMa) consistently improves learning performance across multiple deep RL algorithms and benchmarks, such as MuJoCo and the DeepMind Control Suite.

Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

TL;DR

This paper addresses neuronal activity degradation in deep reinforcement learning and shows that activation-based dormancy metrics lose power in modern architectures. It proposes GraMa, a gradient-magnitude based metric, to quantify neuron-level learning capacity, and ReGraMa, a gradient-guided reset mechanism, which consistently restores activity across residual, diffusion-based, and MLP policies. Through experiments on BRO-net, DACER diffusion policies, and SAC variants across MuJoCo and DeepMind Control Suite tasks, GraMa-based resets yield improved learning stability and performance, particularly as model scale increases. The approach offers a lightweight, architecture-agnostic tool for maintaining continual learning ability in deep RL, with potential to enhance generalization and adaptation in complex agents.

Abstract

Deep reinforcement learning (RL) agents frequently suffer from neuronal activity loss, which impairs their ability to adapt to new data and learn continually. A common method to quantify and address this issue is the tau-dormant neuron ratio, which uses activation statistics to measure the expressive ability of neurons. While effective for simple MLP-based agents, this approach loses statistical power in more complex architectures. To address this, we argue that in advanced RL agents, maintaining a neuron's learning capacity, its ability to adapt via gradient updates, is more critical than preserving its expressive ability. Based on this insight, we shift the statistical objective from activations to gradients, and introduce GraMa (Gradient Magnitude Neural Activity Metric), a lightweight, architecture-agnostic metric for quantifying neuron-level learning capacity. We show that GraMa effectively reveals persistent neuron inactivity across diverse architectures, including residual networks, diffusion models, and agents with varied activation functions. Moreover, resetting neurons guided by GraMa (ReGraMa) consistently improves learning performance across multiple deep RL algorithms and benchmarks, such as MuJoCo and the DeepMind Control Suite.

Paper Structure

This paper contains 31 sections, 1 equation, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Dormant neuron metric struggles in advanced vision RL BRO-net agent nauman2024bigger. Neuron resetting based on the dormant neuron index (ReDo) cannot restore the learning capacity of the agent, which limits its effectiveness. The curves record normalized score across 3 image-input tasks (15 runs per method), i.e., Dog Stand, Dog Walk, Dog Run.
  • Figure 2: Performance and neuron inactivity with the default BRO-net size. (Top row) Episode return across four environments (Humanoid Stand, Dog Walk, Dog Stand, Humanoid Walk). (Bottom row) Corresponding proportion of inactive neurons. ReGraMa consistently achieves higher returns while maintaining fewer inactive neurons compared to ReDo and the vanilla baseline, demonstrating its effectiveness in stabilizing learning dynamics. Results are averaged over four seeds.
  • Figure 3: ReGraMa is more robust under network scaling. Results averaged over four DMC tasks with 12 seeds per method.
  • Figure 4: Normalized scores for Ant and Walker2d. Boxes show 4 seeds, whiskers indicate min/max, the midline denotes the median.
  • Figure 5: Proportion of inactive neurons during training across two MuJoCo tasks (Ant and Walker2d). ReGraMa maintains a consistently lower ratio of inactive neurons.
  • ...and 2 more figures