Deep Companion Learning: Enhancing Generalization Through Historical Consistency
Ruizhao Zhu, Venkatesh Saligrama
TL;DR
This work tackles generalization in supervised learning by addressing SGD variability during training. It introduces Deep Companion Learning (DCL), which uses a deep-companion network $\omega$ to forecast logits on new inputs based on historical deployments $\theta_t$ and enforces predictive consistency via a data-dependent regularizer that aligns current predictions with this forecast. Empirically, DCL yields state-of-the-art results across CIFAR-100, Tiny-ImageNet, and ImageNet-1K with diverse backbones, often matching or exceeding pre-trained models while training from scratch and reducing computational demands. The approach is versatile, extending to fine-tuning, semi-supervised learning, self-supervised pretraining, and knowledge distillation, with ablations validating the choice of $\alpha$, the use of MSE distance, and the feasibility of smaller companions and reduced data.
Abstract
We propose Deep Companion Learning (DCL), a novel training method for Deep Neural Networks (DNNs) that enhances generalization by penalizing inconsistent model predictions compared to its historical performance. To achieve this, we train a deep-companion model (DCM), by using previous versions of the model to provide forecasts on new inputs. This companion model deciphers a meaningful latent semantic structure within the data, thereby providing targeted supervision that encourages the primary model to address the scenarios it finds most challenging. We validate our approach through both theoretical analysis and extensive experimentation, including ablation studies, on a variety of benchmark datasets (CIFAR-100, Tiny-ImageNet, ImageNet-1K) using diverse architectural models (ShuffleNetV2, ResNet, Vision Transformer, etc.), demonstrating state-of-the-art performance.
