YOLO Meets Mixture-of-Experts: Adaptive Expert Routing for Robust Object Detection
Ori Meiraz, Sharon Shalev, Avishai Weizman
TL;DR
This work addresses robustness in object detection by integrating a Mixture-of-Experts (MoE) framework into YOLOv9-T, enabling adaptive routing among specialized detectors. The proposed architecture employs I=3 multi-scale feature maps and E=2 experts, with routers at each scale performing a Hadamard-based fusion to generate normalized routing weights α_i for each expert, and a load-balancing loss L_{lb} to prevent expert collapse: L = L_{det} + λ_{lb} L_{lb}. The MoE routing allows dynamic feature-level specialization, leading to improved mean Average Precision and Average Recall on COCO and VisDrone datasets, including multi-dataset and combined training scenarios (e.g., COCO+Vis with mAP up to 37.5 and AR up to 50.0). The work demonstrates the practicality of MoE for object detection and suggests future extensions to larger YOLO variants, more efficient routing, and temporal or multi-modal video applications.
Abstract
This paper presents a novel Mixture-of-Experts framework for object detection, incorporating adaptive routing among multiple YOLOv9-T experts to enable dynamic feature specialization and achieve higher mean Average Precision (mAP) and Average Recall (AR) compared to a single YOLOv9-T model.
