Table of Contents
Fetching ...

Parseval Regularization for Continual Reinforcement Learning

Wesley Chung, Lynn Cherif, David Meger, Doina Precup

TL;DR

This paper tackles continual reinforcement learning challenges, such as plasticity loss and primacy bias, by applying Parseval regularization to maintain near-orthogonal weight matrices and stable gradient flow across task sequences. The authors formalize an objective that adds a Parseval term, enforcing $W W^{\top} \approx s I$, to both policy and value networks, at a per-layer computational cost of $O(d^3)$ with modest runtime overhead. Through extensive experiments on Gridworld, CARL, and Metaworld benchmarks, they show that Parseval regularization yields substantial improvements in continual RL performance, supported by ablations and analyses of network properties like stable rank, weight diversity, and the input-output Jacobian. They also explore relaxations (diagonal layers, input scaling, and subgroup Parseval) to balance capacity and optimization benefits, highlighting the practical viability of orthogonality-based regularization for rapid adaptation in nonstationary RL with manageable computational costs.

Abstract

Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks -- all referring to the increased difficulty in training on new tasks. We propose to use Parseval regularization, which maintains orthogonality of weight matrices, to preserve useful optimization properties and improve training in a continual reinforcement learning setting. We show that it provides significant benefits to RL agents on a suite of gridworld, CARL and MetaWorld tasks. We conduct comprehensive ablations to identify the source of its benefits and investigate the effect of certain metrics associated to network trainability including weight matrix rank, weight norms and policy entropy.

Parseval Regularization for Continual Reinforcement Learning

TL;DR

This paper tackles continual reinforcement learning challenges, such as plasticity loss and primacy bias, by applying Parseval regularization to maintain near-orthogonal weight matrices and stable gradient flow across task sequences. The authors formalize an objective that adds a Parseval term, enforcing , to both policy and value networks, at a per-layer computational cost of with modest runtime overhead. Through extensive experiments on Gridworld, CARL, and Metaworld benchmarks, they show that Parseval regularization yields substantial improvements in continual RL performance, supported by ablations and analyses of network properties like stable rank, weight diversity, and the input-output Jacobian. They also explore relaxations (diagonal layers, input scaling, and subgroup Parseval) to balance capacity and optimization benefits, highlighting the practical viability of orthogonality-based regularization for rapid adaptation in nonstationary RL with manageable computational costs.

Abstract

Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks -- all referring to the increased difficulty in training on new tasks. We propose to use Parseval regularization, which maintains orthogonality of weight matrices, to preserve useful optimization properties and improve training in a continual reinforcement learning setting. We show that it provides significant benefits to RL agents on a suite of gridworld, CARL and MetaWorld tasks. We conduct comprehensive ablations to identify the source of its benefits and investigate the effect of certain metrics associated to network trainability including weight matrix rank, weight norms and policy entropy.

Paper Structure

This paper contains 28 sections, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Performance of algorithms on Metaworld tasks. The tasks change every 1 million steps, matching the dips in success rate in the learning curves (right). On the left, we show performance profiles showing the distribution of average success rates across tasks. Higher is better for both. Parseval regularization significantly improves on the baseline and outperforms other alternatives.
  • Figure 2: Comparing performance profiles of diagonal layers and learnable input scales on Metaworld sequences. Either addition helps with Parseval regularization.
  • Figure 3: The left plot shows performance profiles of Parseval regularization on Metaworld sequences when dividing neurons in a layer into multiple groups. There is no significant improvement from splitting into groups; using only one group is the best choice. Adding Parseval regularization with any number of groups improves on the baseline though. The right plot shows the stable rank of the actor's second layer's weight matrix. Due to the relaxed orthogonal constraint on the weights, we can observe a decrease in the stable rank. Similar plots can be observed for other layers and in the critic.
  • Figure 4: Performance of algorithms on gridworld and CARL environments. Parseval regularization yields the largest improvements although other approaches can be helpful.
  • Figure 5: Performance profiles for different architecture choices. (Left and center) Varying activation functions: all choices benefit from Parseval regularization. (Right) Varying the network width. Parseval regularization can benefit all three settings. Increasing the width alone does not help.
  • ...and 10 more figures