Asymmetric Duos: Sidekicks Improve Uncertainty
Tim G. Zhou, Evan Shelhamer, Geoff Pleiss
TL;DR
This work tackles the high computational cost of uncertainty-aware inference with large pre-trained models by introducing Asymmetric Duos, which pair a large base model with a smaller sidekick and aggregate their predictions using temperature-weighted logits $f_{\text{Duo}}(X)=f_{\text{large}}(X)\cdot T_{\text{large}}+f_{\text{small}}(X)\cdot T_{\text{small}}$. Through calibrated temperature tuning on a validation set, the framework yields improved accuracy, uncertainty quantification, and selective classification metrics at only $10$-$20\%$ extra FLOPs across five image classification benchmarks, with results robust to model soup integrations and compatible with transfer-learning workflows. The study provides comprehensive ablations (unweighted vs. weighted, UQ-only variants), Computational-Cost analyses via FLOPs balance, and extensive comparisons to deep ensembles, showing that Duos can closely match or exceed ensemble performance with far lower compute. The practical impact is substantial: deployable, flexible, and uncertainty-aware improvements for large-tuned vision models without prohibitive training-time costs.
Abstract
The go-to strategy to apply deep networks in settings where uncertainty informs decisions--ensembling multiple training runs with random initializations--is ill-suited for the extremely large-scale models and practical fine-tuning workflows of today. We introduce a new cost-effective strategy for improving the uncertainty quantification and downstream decisions of a large model (e.g. a fine-tuned ViT-B): coupling it with a less accurate but much smaller "sidekick" (e.g. a fine-tuned ResNet-34) with a fraction of the computational cost. We propose aggregating the predictions of this Asymmetric Duo by simple learned weighted averaging. Surprisingly, despite their inherent asymmetry, the sidekick model almost never harms the performance of the larger model. In fact, across five image classification benchmarks and a variety of model architectures and training schemes (including soups), Asymmetric Duos significantly improve accuracy, uncertainty quantification, and selective classification metrics with only ${\sim}10-20\%$ more computation.
