Robust Object Detection for Autonomous Driving via Curriculum-Guided Group Relative Policy Optimization
Xu Jia
TL;DR
This work tackles robust multimodal object detection for autonomous driving by marrying Group Relative Policy Optimization (GRPO) with curriculum-guided data scheduling and difficulty-aware filtering to address sparse and noisy reward signals. The approach uses a multi-component loss (IoU, Format, KL) and intra-group reward normalization to stabilize learning while progressively presenting harder samples through a curriculum. Empirical results on CODA and BDD-100K show substantial gains in IoU for critical categories (e.g., 9.4 percentage points on BDD-100K pedestrians/non-motorized vehicles and 7.1 percentage points on CODA traffic participants), demonstrating improved localization and robustness across domains. The findings suggest reinforcement-driven optimization with structured data curricula as a scalable path toward robust, interpretable vision-language perception in real-world autonomous systems.
Abstract
Multimodal Large Language Models (MLLMs) excel in vision-language reasoning but often struggle with structured perception tasks requiring precise localization and robustness. We propose a reinforcement learning framework that augments Group Relative Policy Optimization (GRPO) with curriculum-based data scheduling and difficulty-aware filtering. This approach stabilizes optimization under sparse, noisy rewards and enables progressive adaptation to complex samples. Evaluations on autonomous driving benchmarks demonstrate substantial improvements in detection accuracy and robustness. Ablation studies confirm the importance of reward design, KL regularization, and curriculum pacing for convergence stability and generalization. Our findings highlight reinforcement-driven optimization with structured data curricula as a scalable path toward robust and interpretable multimodal detection.
