Wholly-WOOD: Wholly Leveraging Diversified-quality Labels for Weakly-supervised Oriented Object Detection
Yi Yu, Xue Yang, Yansheng Li, Zhenjun Han, Feipeng Da, Junchi Yan
TL;DR
Wholly-WOOD tackles the high cost of rotated bounding box annotations by unifying Point, HBox, and RBox supervision into a single weakly-supervised framework for oriented object detection. It combines symmetry-aware learning to infer orientation with a Point-to-RBox knowledge-aggregation module, producing accurate RBoxes from diverse labels. Key contributions include a theory of symmetry-based angle estimation, H2RBox-v2 and Point2RBox extensions, and the integrated Wholly-WOOD system with a P2R subnet, achieving near-parity with fully supervised baselines while reducing labeling effort. The approach demonstrates strong results on remote-sensing datasets and shows potential for broader applicability, with open-source PyTorch/Jittor implementations provided.
Abstract
Accurately estimating the orientation of visual objects with compact rotated bounding boxes (RBoxes) has become a prominent demand, which challenges existing object detection paradigms that only use horizontal bounding boxes (HBoxes). To equip the detectors with orientation awareness, supervised regression/classification modules have been introduced at the high cost of rotation annotation. Meanwhile, some existing datasets with oriented objects are already annotated with horizontal boxes or even single points. It becomes attractive yet remains open for effectively utilizing weaker single point and horizontal annotations to train an oriented object detector (OOD). We develop Wholly-WOOD, a weakly-supervised OOD framework, capable of wholly leveraging various labeling forms (Points, HBoxes, RBoxes, and their combination) in a unified fashion. By only using HBox for training, our Wholly-WOOD achieves performance very close to that of the RBox-trained counterpart on remote sensing and other areas, significantly reducing the tedious efforts on labor-intensive annotation for oriented objects. The source codes are available at https://github.com/VisionXLab/whollywood (PyTorch-based) and https://github.com/VisionXLab/whollywood-jittor (Jittor-based).
