Tailoring Mixup to Data for Calibration
Quentin Bouniot, Pavlo Mozharovskyi, Florence d'Alché-Buc
TL;DR
This work addresses calibration gaps in Mixup-based data augmentation by introducing Similarity Kernel Mixup (SK Mixup), which warps interpolation through a similarity-driven kernel to mix similar samples more strongly while attenuating mixing for distant pairs. The method links the likelihood of label noise to manifold distance via a Wasserstein-based bound, and uses a warping function $\omega_{\tau}$ to realize Beta$(\tau,\tau)$–distributed coefficients in a computationally efficient way. A Gaussian similarity kernel computes pairwise warping parameters from batch-distance statistics, enabling distance-aware mixing in both classification (embedding-distance) and regression (label-distance) settings. Extensive experiments across image classification and regression tasks show improved calibration (lower ECE/AECE, UCE/ENCE) with competitive or better accuracy, plus substantial efficiency gains over state-of-the-art calibration-driven Mixup methods. The findings suggest SK Mixup offers a practical, scalable augmentation strategy that enhances model reliability, including under distribution shifts and OOD conditions, and can be combined with RegMixup for further gains.
Abstract
Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved predictive performance, Mixup is also a good technique for improving calibration. However, mixing data carelessly can lead to manifold mismatch, i.e., synthetic data lying outside original class manifolds, which can deteriorate calibration. In this work, we show that the likelihood of assigning a wrong label with mixup increases with the distance between data to mix. To this end, we propose to dynamically change the underlying distributions of interpolation coefficients depending on the similarity between samples to mix, and define a flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves predictive performance and calibration of models, while being much more efficient.
