Table of Contents
Fetching ...

Learn2Mix: Training Neural Networks Using Adaptive Data Integration

Shyam Venkatasubramanian, Vahid Tarokh

TL;DR

learn2mix introduces a dynamic batch composition strategy that adaptively shifts class proportions toward harder classes based on instantaneous class-wise losses, enabling faster convergence under resource constraints. The method is formalized as a bilevel optimization where network parameters are updated with the current mixing, followed by updating the mixing proportions toward the normalized losses. Theoretical results show convergence of both the network parameters and the mixing vector to optimal targets under standard assumptions, while empirical evaluations across classification, regression, and reconstruction tasks demonstrate consistent convergence acceleration and improved generalization compared with classical training and several baselines. The work highlights the practical impact of adaptive data integration for efficient, robust neural network training in imbalanced and resource-limited environments.

Abstract

Accelerating model convergence in resource-constrained environments is essential for fast and efficient neural network training. This work presents learn2mix, a new training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates. Unlike classical training methods that use static class proportions, learn2mix continually adapts class proportions during training, leading to faster convergence. Empirical evaluations on benchmark datasets show that neural networks trained with learn2mix converge faster than those trained with existing approaches, achieving improved results for classification, regression, and reconstruction tasks under limited training resources and with imbalanced classes. Our empirical findings are supported by theoretical analysis.

Learn2Mix: Training Neural Networks Using Adaptive Data Integration

TL;DR

learn2mix introduces a dynamic batch composition strategy that adaptively shifts class proportions toward harder classes based on instantaneous class-wise losses, enabling faster convergence under resource constraints. The method is formalized as a bilevel optimization where network parameters are updated with the current mixing, followed by updating the mixing proportions toward the normalized losses. Theoretical results show convergence of both the network parameters and the mixing vector to optimal targets under standard assumptions, while empirical evaluations across classification, regression, and reconstruction tasks demonstrate consistent convergence acceleration and improved generalization compared with classical training and several baselines. The work highlights the practical impact of adaptive data integration for efficient, robust neural network training in imbalanced and resource-limited environments.

Abstract

Accelerating model convergence in resource-constrained environments is essential for fast and efficient neural network training. This work presents learn2mix, a new training strategy that adaptively adjusts class proportions within batches, focusing on classes with higher error rates. Unlike classical training methods that use static class proportions, learn2mix continually adapts class proportions during training, leading to faster convergence. Empirical evaluations on benchmark datasets show that neural networks trained with learn2mix converge faster than those trained with existing approaches, achieving improved results for classification, regression, and reconstruction tasks under limited training resources and with imbalanced classes. Our empirical findings are supported by theoretical analysis.

Paper Structure

This paper contains 34 sections, 3 theorems, 59 equations, 8 figures, 3 tables, 2 algorithms.

Key Result

Proposition 2.3

Let $\mathcal{L}(\theta^t), \mathcal{L}(\theta^*) \in \mathbb{R}^{k}$ denote the respective class-wise loss vectors for the model parameters at time $t$ and for the optimal model parameters. Suppose each class-wise loss $\mathcal{L}_i(\theta) \in \mathbb{R}$ is strongly convex in $\theta$, with stro It follows that for learning rate, $\eta \in (0, 2/L^*)$, and for mixing rate, $\gamma \in (0,1)$:

Figures (8)

  • Figure 1: Illustration of the learn2mix training mechanism. The class-wise composition of batches is adaptively modified during training using instantaneous class-wise error rates.
  • Figure 2: Comparing model classification errors for learn2mix, classical, FCL, SMOTE, IS, and CURR training. The x-axis is the elapsed [training] time, while the y-axis is the classification error.
  • Figure 3: Comparing model performance errors for classical training and learn2mix training. The x-axis is the number of elapsed training epochs, while the y-axis is the mean squared error (MSE).
  • Figure 4: Comparing model classification errors for learn2mix, classical, FCL, SMOTE, IS, and CURR training. The x-axis is the elapsed [training] time, while the y-axis is the classification error.
  • Figure 5: Comparing model classification errors for learn2mix, classical, FCL, SMOTE, IS, and CURR training. The x-axis is the elapsed [training] time, while the y-axis is the classification error.
  • ...and 3 more figures

Theorems & Definitions (8)

  • Definition 2.1: Loss Function for Classical Training
  • Definition 2.2: Loss Function for Learn2Mix Training
  • Proposition 2.3
  • Corollary 2.4
  • Proposition 2.5
  • proof
  • proof
  • proof