Long-Tailed Recognition on Binary Networks by Calibrating A Pre-trained Model
Jihun Kim, Dahyun Kim, Hyungrok Jung, Taeil Oh, Jonghyun Choi
TL;DR
This work tackles long-tailed recognition on resource-constrained binary neural networks by introducing Calibrate and Distill (CANDLE). A pretrained full-precision teacher is calibrated on target LT data and used to distill supervision into a binary student, with an adversarially learned balancing of distillation terms and an efficient multiresolution learning scheme to generalize across datasets. The approach yields large improvements over prior LT methods across 15 benchmarks, especially boosting tail-class accuracy, while maintaining computational efficiency suitable for edge deployment. The results demonstrate that distillation from a fixed FP teacher, when combined with dataset-aware balancing and multiresolution calibration, provides a scalable path to accurate LT recognition on binary networks. Limitations include dependency on non-LT pretrained teachers; future work could explore LT-pretrained teachers to further enhance performance and fairness across classes.
Abstract
Deploying deep models in real-world scenarios entails a number of challenges, including computational efficiency and real-world (e.g., long-tailed) data distributions. We address the combined challenge of learning long-tailed distributions using highly resource-efficient binary neural networks as backbones. Specifically, we propose a calibrate-and-distill framework that uses off-the-shelf pretrained full-precision models trained on balanced datasets to use as teachers for distillation when learning binary networks on long-tailed datasets. To better generalize to various datasets, we further propose a novel adversarial balancing among the terms in the objective function and an efficient multiresolution learning scheme. We conducted the largest empirical study in the literature using 15 datasets, including newly derived long-tailed datasets from existing balanced datasets, and show that our proposed method outperforms prior art by large margins (>14.33% on average).
