Preference-Conditioned Gradient Variations for Multi-Objective Quality-Diversity
Hannah Janmohamed, Maxence Faldor, Thomas Pierrot, Antoine Cully
TL;DR
The paper tackles the challenge of generating diverse, high-performing solutions for multi-objective tasks by integrating preference-conditioned gradient mutations with crowding in a Map-Elites-style archive. The proposed MOME-P2C uses a single, preference-conditioned actor-critic to produce policy-gradient updates aimed at targeted objective trade-offs, while crowding preserves a uniform spread of non-dominated solutions across feature cells. Empirical results across six robotics locomotion tasks, including tri-objective variants, show that MOME-P2C outperforms or matches state-of-the-art MOQD baselines and achieves smoother trade-offs, with ablations highlighting the importance of crowding, actor-injection, and gradient mutations. The approach offers improved data efficiency and scalability to higher-objective settings, with practical implications for robust, adaptable control policies in robotics and related domains.
Abstract
In a variety of domains, from robotics to finance, Quality-Diversity algorithms have been used to generate collections of both diverse and high-performing solutions. Multi-Objective Quality-Diversity algorithms have emerged as a promising approach for applying these methods to complex, multi-objective problems. However, existing methods are limited by their search capabilities. For example, Multi-Objective Map-Elites depends on random genetic variations which struggle in high-dimensional search spaces. Despite efforts to enhance search efficiency with gradient-based mutation operators, existing approaches consider updating solutions to improve on each objective separately rather than achieving desired trade-offs. In this work, we address this limitation by introducing Multi-Objective Map-Elites with Preference-Conditioned Policy-Gradient and Crowding Mechanisms: a new Multi-Objective Quality-Diversity algorithm that uses preference-conditioned policy-gradient mutations to efficiently discover promising regions of the objective space and crowding mechanisms to promote a uniform distribution of solutions on the non-dominated front. We evaluate our approach on six robotics locomotion tasks and show that our method outperforms or matches all state-of-the-art Multi-Objective Quality-Diversity methods in all six, including two newly proposed tri-objective tasks. Importantly, our method also achieves a smoother set of trade-offs, as measured by newly-proposed sparsity-based metrics.
