Parseval Regularization for Continual Reinforcement Learning
Wesley Chung, Lynn Cherif, David Meger, Doina Precup
TL;DR
This paper tackles continual reinforcement learning challenges, such as plasticity loss and primacy bias, by applying Parseval regularization to maintain near-orthogonal weight matrices and stable gradient flow across task sequences. The authors formalize an objective that adds a Parseval term, enforcing $W W^{\top} \approx s I$, to both policy and value networks, at a per-layer computational cost of $O(d^3)$ with modest runtime overhead. Through extensive experiments on Gridworld, CARL, and Metaworld benchmarks, they show that Parseval regularization yields substantial improvements in continual RL performance, supported by ablations and analyses of network properties like stable rank, weight diversity, and the input-output Jacobian. They also explore relaxations (diagonal layers, input scaling, and subgroup Parseval) to balance capacity and optimization benefits, highlighting the practical viability of orthogonality-based regularization for rapid adaptation in nonstationary RL with manageable computational costs.
Abstract
Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks -- all referring to the increased difficulty in training on new tasks. We propose to use Parseval regularization, which maintains orthogonality of weight matrices, to preserve useful optimization properties and improve training in a continual reinforcement learning setting. We show that it provides significant benefits to RL agents on a suite of gridworld, CARL and MetaWorld tasks. We conduct comprehensive ablations to identify the source of its benefits and investigate the effect of certain metrics associated to network trainability including weight matrix rank, weight norms and policy entropy.
