Augment on Manifold: Mixup Regularization with UMAP
Yousef El-Laham, Elizabeth Fons, Dillon Daudert, Svitlana Vyetrenko
TL;DR
UMAP Mixup introduces a topology-preserving, on-manifold data augmentation by integrating a parametric UMAP regularizer into Mixup. The method optimizes a supervised embedding with a UMAP loss and generates augmentations in the embedding space by mixing neighbor pairs, improving generalization on regression tasks. Across tabular and time-series data, UMAP Mixup achieves competitive or superior RMSE relative to ERM and existing Mixup variants, particularly under distribution shifts, and yields more diverse embeddings than Manifold Mixup. This approach offers a domain-agnostic augmentation tool with practical impact for non-vision data in regression and forecasting tasks.
Abstract
Data augmentation techniques play an important role in enhancing the performance of deep learning models. Despite their proven benefits in computer vision tasks, their application in the other domains remains limited. This paper proposes a Mixup regularization scheme, referred to as UMAP Mixup, designed for ``on-manifold" automated data augmentation for deep learning predictive models. The proposed approach ensures that the Mixup operations result in synthesized samples that lie on the data manifold of the features and labels by utilizing a dimensionality reduction technique known as uniform manifold approximation and projection. Evaluations across diverse regression tasks show that UMAP Mixup is competitive with or outperforms other Mixup variants, show promise for its potential as an effective tool for enhancing the generalization performance of deep learning models.
