EBGAN-MDN: An Energy-Based Adversarial Framework for Multi-Modal Behavior Cloning
Yixiao Li, Julia Barth, Thomas Kiefer, Ahmad Fraij
TL;DR
This work tackles the challenge of multi-modal behavior cloning by integrating energy-based modeling with a Mixture Density Network (MDN) generator, trained via a modified InfoNCE loss and an energy-enforced MDN loss. The MDN generator explicitly models 1-to-$k$ mappings, while the energy model shapes a rich energy landscape to discourage mode collapse and mode averaging. A dynamic scaling mechanism, $\alpha_t$, stabilizes training by progressively reducing the generator's influence as its outputs become more plausible. Empirical results on 2D geometric benchmarks and robotic tasks demonstrate superior mode coverage and sample quality relative to explicit BC, cGAN, IBC, and MDN baselines, highlighting the framework’s robustness and scalability for multimodal planning and imitation in real-world settings.
Abstract
Multi-modal behavior cloning faces significant challenges due to mode averaging and mode collapse, where traditional models fail to capture diverse input-output mappings. This problem is critical in applications like robotics, where modeling multiple valid actions ensures both performance and safety. We propose EBGAN-MDN, a framework that integrates energy-based models, Mixture Density Networks (MDNs), and adversarial training. By leveraging a modified InfoNCE loss and an energy-enforced MDN loss, EBGAN-MDN effectively addresses these challenges. Experiments on synthetic and robotic benchmarks demonstrate superior performance, establishing EBGAN-MDN as a effective and efficient solution for multi-modal learning tasks.
