Table of Contents
Fetching ...

Augment on Manifold: Mixup Regularization with UMAP

Yousef El-Laham, Elizabeth Fons, Dillon Daudert, Svitlana Vyetrenko

TL;DR

UMAP Mixup introduces a topology-preserving, on-manifold data augmentation by integrating a parametric UMAP regularizer into Mixup. The method optimizes a supervised embedding with a UMAP loss and generates augmentations in the embedding space by mixing neighbor pairs, improving generalization on regression tasks. Across tabular and time-series data, UMAP Mixup achieves competitive or superior RMSE relative to ERM and existing Mixup variants, particularly under distribution shifts, and yields more diverse embeddings than Manifold Mixup. This approach offers a domain-agnostic augmentation tool with practical impact for non-vision data in regression and forecasting tasks.

Abstract

Data augmentation techniques play an important role in enhancing the performance of deep learning models. Despite their proven benefits in computer vision tasks, their application in the other domains remains limited. This paper proposes a Mixup regularization scheme, referred to as UMAP Mixup, designed for ``on-manifold" automated data augmentation for deep learning predictive models. The proposed approach ensures that the Mixup operations result in synthesized samples that lie on the data manifold of the features and labels by utilizing a dimensionality reduction technique known as uniform manifold approximation and projection. Evaluations across diverse regression tasks show that UMAP Mixup is competitive with or outperforms other Mixup variants, show promise for its potential as an effective tool for enhancing the generalization performance of deep learning models.

Augment on Manifold: Mixup Regularization with UMAP

TL;DR

UMAP Mixup introduces a topology-preserving, on-manifold data augmentation by integrating a parametric UMAP regularizer into Mixup. The method optimizes a supervised embedding with a UMAP loss and generates augmentations in the embedding space by mixing neighbor pairs, improving generalization on regression tasks. Across tabular and time-series data, UMAP Mixup achieves competitive or superior RMSE relative to ERM and existing Mixup variants, particularly under distribution shifts, and yields more diverse embeddings than Manifold Mixup. This approach offers a domain-agnostic augmentation tool with practical impact for non-vision data in regression and forecasting tasks.

Abstract

Data augmentation techniques play an important role in enhancing the performance of deep learning models. Despite their proven benefits in computer vision tasks, their application in the other domains remains limited. This paper proposes a Mixup regularization scheme, referred to as UMAP Mixup, designed for ``on-manifold" automated data augmentation for deep learning predictive models. The proposed approach ensures that the Mixup operations result in synthesized samples that lie on the data manifold of the features and labels by utilizing a dimensionality reduction technique known as uniform manifold approximation and projection. Evaluations across diverse regression tasks show that UMAP Mixup is competitive with or outperforms other Mixup variants, show promise for its potential as an effective tool for enhancing the generalization performance of deep learning models.
Paper Structure (14 sections, 16 equations, 1 figure, 1 table)

This paper contains 14 sections, 16 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: A visual comparison of resulting embeddings from both Manifold Mixup and UMAP Mixup regularizations on the RCL and GME datasets. Visualizations are obtained by applying T-SNE to the extracted features just before the output layer of each neural network.