Table of Contents
Fetching ...

A Survey on Mixup Augmentations and Beyond

Xin Jin, Hongyu Zhu, Siyuan Li, Zedong Wang, Zicheng Liu, Juanxi Tian, Chang Yu, Huafeng Qin, Stan Z. Li

TL;DR

This survey addresses the data-hungry nature of deep networks by focusing on Mixup as a principled, data-centric regularization that creates virtual samples through convex data-label combinations. It reframes Mixup as a unified training framework with modular components (initialization, sample/label/channel policies) and systematically catalogs methods across CV, NLP, graphs, and beyond, including SSL, Semi-SL, and knowledge distillation settings. The work provides a two-pronged taxonomy (Sample Mixup Policies and Label Mixup Policies) and a comprehensive review of applications, theoretical insights (VRM, calibration, robustness), and practical considerations, offering guidance for designing unified Mixup strategies. Overall, it emphasizes extending Mixup to diverse modalities and tasks, promoting a decision framework for efficient, scalable, and transferable data augmentation in modern AI systems.

Abstract

As Deep Neural Networks have achieved thrilling breakthroughs in the past decade, data augmentations have garnered increasing attention as regularization techniques when massive labeled data are unavailable. Among existing augmentations, Mixup and relevant data-mixing methods that convexly combine selected samples and the corresponding labels are widely adopted because they yield high performances by generating data-dependent virtual data while easily migrating to various domains. This survey presents a comprehensive review of foundational mixup methods and their applications. We first elaborate on the training pipeline with mixup augmentations as a unified framework containing modules. A reformulated framework could contain various mixup methods and give intuitive operational procedures. Then, we systematically investigate the applications of mixup augmentations on vision downstream tasks, various data modalities, and some analysis \& theorems of mixup. Meanwhile, we conclude the current status and limitations of mixup research and point out further work for effective and efficient mixup augmentations. This survey can provide researchers with the current state of the art in mixup methods and provide some insights and guidance roles in the mixup arena. An online project with this survey is available at https://github.com/Westlake-AI/Awesome-Mixup.

A Survey on Mixup Augmentations and Beyond

TL;DR

This survey addresses the data-hungry nature of deep networks by focusing on Mixup as a principled, data-centric regularization that creates virtual samples through convex data-label combinations. It reframes Mixup as a unified training framework with modular components (initialization, sample/label/channel policies) and systematically catalogs methods across CV, NLP, graphs, and beyond, including SSL, Semi-SL, and knowledge distillation settings. The work provides a two-pronged taxonomy (Sample Mixup Policies and Label Mixup Policies) and a comprehensive review of applications, theoretical insights (VRM, calibration, robustness), and practical considerations, offering guidance for designing unified Mixup strategies. Overall, it emphasizes extending Mixup to diverse modalities and tasks, promoting a decision framework for efficient, scalable, and transferable data augmentation in modern AI systems.

Abstract

As Deep Neural Networks have achieved thrilling breakthroughs in the past decade, data augmentations have garnered increasing attention as regularization techniques when massive labeled data are unavailable. Among existing augmentations, Mixup and relevant data-mixing methods that convexly combine selected samples and the corresponding labels are widely adopted because they yield high performances by generating data-dependent virtual data while easily migrating to various domains. This survey presents a comprehensive review of foundational mixup methods and their applications. We first elaborate on the training pipeline with mixup augmentations as a unified framework containing modules. A reformulated framework could contain various mixup methods and give intuitive operational procedures. Then, we systematically investigate the applications of mixup augmentations on vision downstream tasks, various data modalities, and some analysis \& theorems of mixup. Meanwhile, we conclude the current status and limitations of mixup research and point out further work for effective and efficient mixup augmentations. This survey can provide researchers with the current state of the art in mixup methods and provide some insights and guidance roles in the mixup arena. An online project with this survey is available at https://github.com/Westlake-AI/Awesome-Mixup.
Paper Structure (55 sections, 37 equations, 15 figures, 5 tables)

This paper contains 55 sections, 37 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Research timeline in Mixup methods can be broadly categorized into Sample Mixup Policies and Label Mixup Policies from 2018 to 2024 according to the unified framework. We summarized some mainstream methods in 7 classes based on training paradigms and data modalities: SL based on Sample-level, SL based on Label-level, SSL based on CL, SSL based on MIM, Semi-SL, Graph, NLP, and Speech.
  • Figure 2: The unified framework of Mixup methods. The top part is the process of mixup methods. Sampling a Mini-batch of raw samples from the dataset. Then, the mixed samples are obtained through the Initialization and Sample Mixup Policies modules. After the Label Mixup Policies and Sampling modules, encoded by a network and through Channel Mixup Policies. Finally, the loss by the specific loss function. The down part displays detailed ways of each module in the mixup process.
  • Figure 3: Illustration of sample mixup policies in SL, we divided them into two branches: Ad-Hoc and Adaptive, and divided them into nine detailed types.
  • Figure 4: Illustration of simplifying the flow of sample mixup policies in supervised learning (SL). Note that $x_i$ and $x_j$ denote different samples, $z$ denotes feature maps, $\hat{x}$ denotes mixed sample, $\lambda$ was the mixing ratio, and $\mathcal{M}$ denotes mask. (a) Static Linear interpolates samples directly; (b) Feature-based interpolates sample's feature maps; (c) Cutting-based uses cut or resize way mixing samples; (d) K Samples Mixup mixing more than two samples; (e) Random Policies randomly choose mixup policy; (f) Style-based uses a style-transfer encoder to extract content and style and decode the mixed samples; (g) Saliency-based and (h) Attention-based apply a per-train CNN or ViT, mixing samples according to the saliency map or attention score; (i) Generating Sample uses Generative Models to obtain mixed samples.
  • Figure 5: The process of obtaining the mixed samples based on (a). CutMix, (b). ResizeMix & (c). StakcMix method.
  • ...and 10 more figures