Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

Can Jin; Tong Che; Hongwu Peng; Yiyuan Li; Dimitris N. Metaxas; Marco Pavone

Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

Can Jin, Tong Che, Hongwu Peng, Yiyuan Li, Dimitris N. Metaxas, Marco Pavone

TL;DR

The results suggest the effectiveness and efficiency of LoT in identifying generalizable information at the right scales while discarding spurious data correlations, thus making LoT a valuable addition to current machine learning.

Abstract

Generalization remains a central challenge in machine learning. In this work, we propose Learning from Teaching (LoT), a novel regularization technique for deep neural networks to enhance generalization. Inspired by the human ability to capture concise and abstract patterns, we hypothesize that generalizable correlations are expected to be easier to imitate. LoT operationalizes this concept to improve the generalization of the main model with auxiliary student learners. The student learners are trained by the main model and, in turn, provide feedback to help the main model capture more generalizable and imitable correlations. Our experimental results across several domains, including Computer Vision, Natural Language Processing, and methodologies like Reinforcement Learning, demonstrate that the introduction of LoT brings significant benefits compared to training models on the original dataset. The results suggest the effectiveness and efficiency of LoT in identifying generalizable information at the right scales while discarding spurious data correlations, thus making LoT a valuable addition to current machine learning. Code is available at https://github.com/jincan333/LoT.

Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

TL;DR

Abstract

Paper Structure (43 sections, 4 equations, 5 figures, 13 tables, 2 algorithms)

This paper contains 43 sections, 4 equations, 5 figures, 13 tables, 2 algorithms.

Introduction
Methodology
Generalizable and Spurious Correlations
Hypothesis:
Learning from Teaching Regularization
Discussion
Experiments
Generalizable Correlations are Easier to Imitate than Spurious Correlations.
Atari Games
Language Modeling
Unsupervised Language Pretraining
Supervised Fine-tuning
Image Classification
Analysis of Computational Cost and Efficiency
Additional Investigation
...and 28 more sections

Figures (5)

Figure 1: Training and test KL-divergence losses of student models in LoT using ViT-B/16 and ViT-L/16 on CIFAR-100 with different teacher models. The sophisticated students achieve lower losses than the deceptive students given the same computational budget.
Figure 2: The episodic return of the teacher agent in LoT and the Teacher-only on four Atari games (averaged over ten runs). LoT demonstrates return gains over Teacher-only on all games.
Figure 3: Test accuracy of teacher models in LoT and Teacher-only using ViT-B/16 and ViT-L/16 on CIFAR-100. LoT achieves higher test accuracy with fewer training steps.
Figure 4: Effects of regularization coefficient $\alpha$ (left) and student steps ratio $N$ (right). $\alpha=1$ is the best $\alpha$ value to achieve the lowest test perplexity of the teacher model, and moderate student steps ratio $N$ such as 4 and 5 benefit the teacher model the most.
Figure 5: Training and test KL-divergence losses of student models in LoT using ResNet-18 and ResNet-50 on CIFAR-100 with different teacher models.

Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

TL;DR

Abstract

Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

Authors

TL;DR

Abstract

Table of Contents

Figures (5)