Mixup Augmentation with Multiple Interpolations

Lifeng Shen; Jincheng Yu; Hansi Yang; James T. Kwok

Mixup Augmentation with Multiple Interpolations

Lifeng Shen, Jincheng Yu, Hansi Yang, James T. Kwok

TL;DR

This work extends mixup data augmentation by introducing multi-mix, which generates $K>1$ interpolations per sample pair using interpolants $\lambda_k$ drawn from $\Beta(\alpha,\alpha)$. The authors provide a variance-reduction analysis showing that increasing $K$ reduces gradient variance, and they derive an empirical loss that averages over all interpolations along the mixup path. They extend multiple mixup variants (input, manifold, cutmix, puzzle-mix) to the multi-interpolation setting and demonstrate across synthetic data, CIFAR-100, Tiny-Imagenet, ImageNet-1K, WSOL, corruption robustness, transfer learning to CUB, and speech tasks that multi-mix improves generalization, robustness, and calibration compared to standard mixup and baselines. The results indicate notable gains in accuracy and reliability with only modest computational overhead, making multi-mix a practical enhancement to the mixup family for diverse domains.

Abstract

Mixup and its variants form a popular class of data augmentation techniques.Using a random sample pair, it generates a new sample by linear interpolation of the inputs and labels. However, generating only one single interpolation may limit its augmentation ability. In this paper, we propose a simple yet effective extension called multi-mix, which generates multiple interpolations from a sample pair. With an ordered sequence of generated samples, multi-mix can better guide the training process than standard mixup. Moreover, theoretically, this can also reduce the stochastic gradient variance. Extensive experiments on a number of synthetic and large-scale data sets demonstrate that multi-mix outperforms various mixup variants and non-mixup-based baselines in terms of generalization, robustness, and calibration.

Mixup Augmentation with Multiple Interpolations

TL;DR

This work extends mixup data augmentation by introducing multi-mix, which generates

interpolations per sample pair using interpolants

drawn from

. The authors provide a variance-reduction analysis showing that increasing

reduces gradient variance, and they derive an empirical loss that averages over all interpolations along the mixup path. They extend multiple mixup variants (input, manifold, cutmix, puzzle-mix) to the multi-interpolation setting and demonstrate across synthetic data, CIFAR-100, Tiny-Imagenet, ImageNet-1K, WSOL, corruption robustness, transfer learning to CUB, and speech tasks that multi-mix improves generalization, robustness, and calibration compared to standard mixup and baselines. The results indicate notable gains in accuracy and reliability with only modest computational overhead, making multi-mix a practical enhancement to the mixup family for diverse domains.

Abstract

Paper Structure (28 sections, 2 theorems, 30 equations, 8 figures, 8 tables)

This paper contains 28 sections, 2 theorems, 30 equations, 8 figures, 8 tables.

Introduction
Related Work: Mixup and its Variants
Multi-Mix: Mixup with Multiple Interpolations
Generating Multiple Interpolations
Extending Input Mixup and Manifold Mixup
Extending Cutmix
Extending Puzzle-mix
Variance Reduction with Multi-Mix
Discussion
Large-batch Training
Large-batch Mixup
Batch Augmentation
Experiments
Synthetic Data Classification
Classification on CIFAR-100 and TinyImagenet
...and 13 more sections

Key Result

Proposition 3.1

$\mathrm{Var}[\tilde{{\mathbf{g}}}]$ decreases with $K$.

Figures (8)

Figure 1: An example of mixup training. Here, a cat and a dog are mixed with a mixing coefficient $\lambda=0.5$. The symbol $\succ$ describes relative ordering along the mixup transformation paths (cat $\succ$ (half-cat+half-dog) $\succ$ dog) in both input and output spaces. Mixup training improves performance by leveraging order relationship along with mixup transformation paths in the hidden space of $\mathcal{F}$.
Figure 2: Examples of input mixup and saliency-based mixup techniques.
Figure 3: Examples interpolations ($\hat{h}^0_1, \hat{h}^0_2, \hat{h}^0_3, \hat{h}^0_4$) generated by input mixup (top), cutmix (middle), and puzzle-mix (bottom).
Figure 4: The noisy $\textit{spiral}$ data and decision boundaries learned without mixup, with manifold mixup, and manifold mixup with multiple interpolations. Values in brackets are the top-1 test accuracies obtained.
Figure 5: Example localization results on CUB 200-2011 from networks pretrained with different mixup methods. Red: Predicted bounding box; Green: Ground-truth bounding box.
...and 3 more figures

Theorems & Definitions (2)

Proposition 3.1
Proposition 3.2

Mixup Augmentation with Multiple Interpolations

TL;DR

Abstract

Mixup Augmentation with Multiple Interpolations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (2)