Maximum Total Correlation Reinforcement Learning
Bang You, Puze Liu, Huaping Liu, Jan Peters, Oleg Arenz
TL;DR
This work addresses brittleness in reinforcement learning by introducing a trajectory-level information bias: maximizing the total correlation within induced state-action sequences. The authors formulate Maximum Total Correlation RL (MTC-RL), derive a variational lower bound to enable practical optimization, and implement it atop Soft Actor-Critic with an adaptive information-weighting coefficient. Empirically, MTC-RL yields more compressible and predictable trajectories, improving robustness to observation and action noise as well as dynamics changes, while maintaining or improving task performance across locomotion, manipulation, and image-based control benchmarks. The results suggest trajectory-level regularization as a principled, generalizable approach to enhance robustness and generalization in continuous-control agents.
Abstract
Simplicity is a powerful inductive bias. In reinforcement learning, regularization is used for simpler policies, data augmentation for simpler representations, and sparse reward functions for simpler objectives, all that, with the underlying motivation to increase generalizability and robustness by focusing on the essentials. Supplementary to these techniques, we investigate how to promote simple behavior throughout the episode. To that end, we introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories. We propose a practical algorithm that optimizes all models, including policy and state representation, based on a lower-bound approximation. In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories, and that exhibit superior robustness to noise and changes in dynamics compared to baseline methods, while also improving performance in the original tasks.
