Interpretability by Design for Efficient Multi-Objective Reinforcement Learning
Qiyue Xia, Tianwei Wang, J. Michael Herrmann
TL;DR
This work tackles multi-objective reinforcement learning by proposing LLE-MORL, a method that achieves interpretability by design through a locally linear Parameter-Performance Relationship (PPR) between policy parameters and objective outcomes. By performing directional retraining to identify local tangent directions and then extending along those directions without retraining, the approach densely and efficiently traces the Pareto front while maintaining an interpretable mapping from parameter changes to trade-offs. The authors provide theoretical guarantees (Lipschitz continuity and local manifold lift) and demonstrate empirically that LLE-MORL yields wider, higher-quality Pareto fronts with strong sample efficiency across 2D and higher-dimensional objectives, outperforming several MORL baselines. The framework is modular and training-free in the extension step, enabling easy integration as a plug-in to existing PPO-based MORL pipelines and offering practical benefits for rapid policy adaptation in changing preferences.
Abstract
Multi-objective reinforcement learning (MORL) aims at optimising several, often conflicting goals to improve the flexibility and reliability of RL in practical tasks. This is typically achieved by finding a set of diverse, non-dominated policies that form a Pareto front in the performance space. We introduce LLE-MORL, an approach that achieves interpretability by design by utilising a training scheme based on the local relationship between the parameter space and the performance space. By exploiting a locally linear map between these spaces, our method provides an interpretation of policy parameters in terms of the objectives, and this structured representation enables an efficient search within contiguous solution domains, allowing for the rapid generation of high-quality solutions without extensive retraining. Experiments across diverse continuous control domains demonstrate that LLE-MORL consistently achieves higher Pareto front quality and efficiency than state-of-the-art approaches.
