Table of Contents
Fetching ...

First-Order Manifold Data Augmentation for Regression Learning

Ilya Kaufman, Omri Azencot

TL;DR

This work tackles the paucity of domain-independent data augmentation for regression by introducing First-Order Manifold Augmentation (FOMA). FOMA generates augmented samples by perturbing along the tangent space of the training data manifold, approximated via SVD and controlled by a Beta-distributed parameter to scale small singular values; it is fully differentiable and can be applied at any layer. The authors provide a VRM-based analysis, compare FOMA to Mixup and other baselines, and demonstrate significant improvements in both in-distribution generalization and out-of-distribution robustness across diverse regression tasks. They also perform extensive ablations to justify design choices and discuss computational considerations tied to the SVD-based approach. Overall, FOMA represents a practical, principled tangent-space augmentation strategy that enhances regression performance under distribution shifts and regularization challenges.

Abstract

Data augmentation (DA) methods tailored to specific domains generate synthetic samples by applying transformations that are appropriate for the characteristics of the underlying data domain, such as rotations on images and time warping on time series data. In contrast, domain-independent approaches, e.g. mixup, are applicable to various data modalities, and as such they are general and versatile. While regularizing classification tasks via DA is a well-explored research topic, the effect of DA on regression problems received less attention. To bridge this gap, we study the problem of domain-independent augmentation for regression, and we introduce FOMA: a new data-driven domain-independent data augmentation method. Essentially, our approach samples new examples from the tangent planes of the train distribution. Augmenting data in this way aligns with the network tendency towards capturing the dominant features of its input signals. We evaluate FOMA on in-distribution generalization and out-of-distribution robustness benchmarks, and we show that it improves the generalization of several neural architectures. We also find that strong baselines based on mixup are less effective in comparison to our approach. Our code is publicly available athttps://github.com/azencot-group/FOMA.

First-Order Manifold Data Augmentation for Regression Learning

TL;DR

This work tackles the paucity of domain-independent data augmentation for regression by introducing First-Order Manifold Augmentation (FOMA). FOMA generates augmented samples by perturbing along the tangent space of the training data manifold, approximated via SVD and controlled by a Beta-distributed parameter to scale small singular values; it is fully differentiable and can be applied at any layer. The authors provide a VRM-based analysis, compare FOMA to Mixup and other baselines, and demonstrate significant improvements in both in-distribution generalization and out-of-distribution robustness across diverse regression tasks. They also perform extensive ablations to justify design choices and discuss computational considerations tied to the SVD-based approach. Overall, FOMA represents a practical, principled tangent-space augmentation strategy that enhances regression performance under distribution shifts and regularization challenges.

Abstract

Data augmentation (DA) methods tailored to specific domains generate synthetic samples by applying transformations that are appropriate for the characteristics of the underlying data domain, such as rotations on images and time warping on time series data. In contrast, domain-independent approaches, e.g. mixup, are applicable to various data modalities, and as such they are general and versatile. While regularizing classification tasks via DA is a well-explored research topic, the effect of DA on regression problems received less attention. To bridge this gap, we study the problem of domain-independent augmentation for regression, and we introduce FOMA: a new data-driven domain-independent data augmentation method. Essentially, our approach samples new examples from the tangent planes of the train distribution. Augmenting data in this way aligns with the network tendency towards capturing the dominant features of its input signals. We evaluate FOMA on in-distribution generalization and out-of-distribution robustness benchmarks, and we show that it improves the generalization of several neural architectures. We also find that strong baselines based on mixup are less effective in comparison to our approach. Our code is publicly available athttps://github.com/azencot-group/FOMA.
Paper Structure (43 sections, 1 theorem, 9 equations, 4 figures, 11 tables)

This paper contains 43 sections, 1 theorem, 9 equations, 4 figures, 11 tables.

Key Result

Theorem 1

Let $P$ be the orthogonal projection onto the column space of $A$. Let $P_{\perp} = I - P$. Then

Figures (4)

  • Figure 1: We show the pseudocode for FOMA at the input level, $l=0$ (left). We demonstrate the effect of a few DA methods on 2D data whose intrinsic dimension is one (right).
  • Figure 2: Training stability and overfitting. (a) RMSE loss on the train set. (b) RMSE loss on the test set. (c) Generalization gap: the difference between test error and train error
  • Figure 3: Evaluating a non-augmented model and a model trained with FOMA on train data whose small singular values are scaled down for different values of $\lambda$ (left). We show on the right panel the probability density function of the original data (green), and its modifications using $\lambda=0$ (blue), and $\lambda=0.5$ (orange).
  • Figure 4: Evaluating a non-augmented model (solid lines) and a model trained with FOMA (dashed lines) on train data whose small singular values are scaled down for different values of $\lambda$ (see also Fig. \ref{['fig:cz_analysis']}, left). Communities and crime dataset trained on a three-layer full connected network (left). Electricity trained on LST-Attn lai2018modeling (right).

Theorems & Definitions (1)

  • Theorem 1