On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks
Nicholas H. Barbara, Ruigang Wang, Ian R. Manchester
TL;DR
The paper addresses robustness of policy networks in deep reinforcement learning by enforcing Lipschitz bounds on the policy, i.e., $\mathrm{Lip}(\kappa) \le \gamma$, to limit sensitivity to perturbations. It compares multiple Lipschitz parameterizations, including spectral normalization (SN), almost orthogonal Lipschitz (AOL), Cayley transforms, and the expressive Sandwich layer, across pendulum swing-up and Atari Pong. Key findings show that smaller $\gamma$ improves robustness to disturbances and adversarial attacks, and that the Sandwich layer enables strong robustness without sacrificing nominal performance, outperforming SN and AOL at similar bounds. The work highlights the importance of layer design in robust RL and suggests future directions toward combining Lipschitz-bounded architectures with adversarial training and deployment on real robotic systems.
Abstract
This paper presents a study of robust policy networks in deep reinforcement learning. We investigate the benefits of policy parameterizations that naturally satisfy constraints on their Lipschitz bound, analyzing their empirical performance and robustness on two representative problems: pendulum swing-up and Atari Pong. We illustrate that policy networks with smaller Lipschitz bounds are more robust to disturbances, random noise, and targeted adversarial attacks than unconstrained policies composed of vanilla multi-layer perceptrons or convolutional neural networks. However, the structure of the Lipschitz layer is important. We find that the widely-used method of spectral normalization is too conservative and severely impacts clean performance, whereas more expressive Lipschitz layers such as the recently-proposed Sandwich layer can achieve improved robustness without sacrificing clean performance.
