Mixup Augmentation with Multiple Interpolations
Lifeng Shen, Jincheng Yu, Hansi Yang, James T. Kwok
TL;DR
This work extends mixup data augmentation by introducing multi-mix, which generates $K>1$ interpolations per sample pair using interpolants $\lambda_k$ drawn from $\Beta(\alpha,\alpha)$. The authors provide a variance-reduction analysis showing that increasing $K$ reduces gradient variance, and they derive an empirical loss that averages over all interpolations along the mixup path. They extend multiple mixup variants (input, manifold, cutmix, puzzle-mix) to the multi-interpolation setting and demonstrate across synthetic data, CIFAR-100, Tiny-Imagenet, ImageNet-1K, WSOL, corruption robustness, transfer learning to CUB, and speech tasks that multi-mix improves generalization, robustness, and calibration compared to standard mixup and baselines. The results indicate notable gains in accuracy and reliability with only modest computational overhead, making multi-mix a practical enhancement to the mixup family for diverse domains.
Abstract
Mixup and its variants form a popular class of data augmentation techniques.Using a random sample pair, it generates a new sample by linear interpolation of the inputs and labels. However, generating only one single interpolation may limit its augmentation ability. In this paper, we propose a simple yet effective extension called multi-mix, which generates multiple interpolations from a sample pair. With an ordered sequence of generated samples, multi-mix can better guide the training process than standard mixup. Moreover, theoretically, this can also reduce the stochastic gradient variance. Extensive experiments on a number of synthetic and large-scale data sets demonstrate that multi-mix outperforms various mixup variants and non-mixup-based baselines in terms of generalization, robustness, and calibration.
