Data Distillation: Towards Omni-Supervised Learning
Ilija Radosavovic, Piotr Dollár, Ross Girshick, Georgia Gkioxari, Kaiming He
TL;DR
The paper tackles the challenge of leveraging unlimited unlabeled data alongside labeled datasets to improve visual recognition, introducing omni-supervised learning and a simple, scalable data distillation pipeline. Data distillation generates hard training labels from unlabeled data by ensembling predictions from a single model over multiple geometric transformations, then retrains a student on the combined labeled and distilled data without altering model architecture or losses. Experiments on COCO keypoint and object detection show consistent performance gains over strong fully supervised baselines across small and large-scale settings, including distribution shifts between labeled and unlabeled data. The results demonstrate that carefully crafted self-training with multi-transform inferences can meaningfully leverage internet-scale unlabeled data to surpass state-of-the-art supervised methods in real-world vision tasks.
Abstract
We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data. Omni-supervised learning is lower-bounded by performance on existing labeled datasets, offering the potential to surpass state-of-the-art fully supervised methods. To exploit the omni-supervised setting, we propose data distillation, a method that ensembles predictions from multiple transformations of unlabeled data, using a single model, to automatically generate new training annotations. We argue that visual recognition models have recently become accurate enough that it is now possible to apply classic ideas about self-training to challenging real-world data. Our experimental results show that in the cases of human keypoint detection and general object detection, state-of-the-art models trained with data distillation surpass the performance of using labeled data from the COCO dataset alone.
