Table of Contents
Fetching ...

Interpolation Consistency Training for Semi-Supervised Learning

Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, David Lopez-Paz

TL;DR

ICT addresses semi-supervised learning by enforcing prediction consistency at interpolations between unlabeled samples, leveraging a mean-teacher framework and mixup-style connections. The method achieves state-of-the-art or competitive results on CIFAR-10, SVHN, and CIFAR-100 with computation-efficient training and modest hyperparameter tuning. The authors provide a theoretical account showing ICT acts as a regularizer on higher-order derivatives, and that high-confidence unlabeled predictions help suppress overfitting at labeled points, supported by empirical ablations. Overall, ICT offers a practical, scalable SSL paradigm with strong empirical gains and a principled derivative-regularization interpretation, with future work pointing toward interpolations in hidden representations.

Abstract

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets. Our theoretical analysis shows that ICT corresponds to a certain type of data-adaptive regularization with unlabeled points which reduces overfitting to labeled points under high confidence values.

Interpolation Consistency Training for Semi-Supervised Learning

TL;DR

ICT addresses semi-supervised learning by enforcing prediction consistency at interpolations between unlabeled samples, leveraging a mean-teacher framework and mixup-style connections. The method achieves state-of-the-art or competitive results on CIFAR-10, SVHN, and CIFAR-100 with computation-efficient training and modest hyperparameter tuning. The authors provide a theoretical account showing ICT acts as a regularizer on higher-order derivatives, and that high-confidence unlabeled predictions help suppress overfitting at labeled points, supported by empirical ablations. Overall, ICT offers a practical, scalable SSL paradigm with strong empirical gains and a principled derivative-regularization interpretation, with future work pointing toward interpolations in hidden representations.

Abstract

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets. Our theoretical analysis shows that ICT corresponds to a certain type of data-adaptive regularization with unlabeled points which reduces overfitting to labeled points under high confidence values.

Paper Structure

This paper contains 18 sections, 52 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Interpolation Consistency Training (ICT) applied to the "two moons" dataset, when three labels per class (large dots) and a large amount of unlabeled data (small dots) are available. When compared to supervised learning (red), ICT encourages a decision boundary traversing a low-density region that would better reflect the structure of the unlabeled data. Both methods employ a multilayer perceptron with three hidden ReLU layers of twenty neurons. Best viewed in colors in the printed version.
  • Figure 2: Interpolation Consistency Training (ICT) learns a student network$f_{\theta}$ in a semi-supervised manner. To this end, ICT uses a mean-teacher $f_{\theta^{\prime}}$, where the teacher parameters $\theta^{\prime}$ are an exponential moving average of the student parameters $\theta$. During training, the student parameters $\theta$ are updated to encourage consistent predictions $f_{\theta}\left(\operatorname{Mix}_{\lambda}\left(u_{j}, u_{k}\right)\right) \approx \operatorname{Mix}_{\lambda}\left(f_{\theta^{\prime}}\left(u_{j}\right), f_{\theta^{\prime}}\left(u_{k}\right)\right)$, and correct predictions for labeled examples $x_{i}$.
  • Figure 3: Numerical validation of the theoretical prediction that ICT performs well when the confidence value$\frac{1}{n} \sum_{i=1}^{n}\left|\frac{1}{2}-f_{\theta}\left(u_{i}\right)\right|$ is high, because that is when ICT acts as a regularizer on directional derivatives of all orders. Each line in each subplot shows the decision boundary of the predictor $f_{\theta}$ (i.e., $\left\{u: f_{\theta}(u)\right\}=\frac{1}{2}$ ) after each update of 1, 10, 100, and 1000. Best viewed in colors in the printed version.
  • Figure 4: Decision boundaries for ICT with the KL divergence and the softplus activation.
  • Figure 5: Decision boundaries for ICT with the KL divergence and the ReLU activation.
  • ...and 2 more figures