Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives

Shrinivas Ramasubramanian; Harsh Rangwani; Sho Takemori; Kunal Samanta; Yuhei Umeda; Venkatesh Babu Radhakrishnan

Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives

Shrinivas Ramasubramanian, Harsh Rangwani, Sho Takemori, Kunal Samanta, Yuhei Umeda, Venkatesh Babu Radhakrishnan

TL;DR

SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models, to optimize for the desired objective and significantly improves the performance for various practical non-decomposable objectives across benchmarks.

Abstract

The rise in internet usage has led to the generation of massive amounts of data, resulting in the adoption of various supervised and semi-supervised machine learning algorithms, which can effectively utilize the colossal amount of data to train models. However, before deploying these models in the real world, these must be strictly evaluated on performance measures like worst-case recall and satisfy constraints such as fairness. We find that current state-of-the-art empirical techniques offer sub-optimal performance on these practical, non-decomposable performance objectives. On the other hand, the theoretical techniques necessitate training a new model from scratch for each performance objective. To bridge the gap, we propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models, to optimize for the desired objective. The core idea of our framework is to determine a sampling distribution to perform a mixup of features between samples from particular classes such that it optimizes the given objective. We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks.

Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives

TL;DR

Abstract

Paper Structure (33 sections, 12 theorems, 59 equations, 7 figures, 17 tables, 2 algorithms)

This paper contains 33 sections, 12 theorems, 59 equations, 7 figures, 17 tables, 2 algorithms.

Introduction
Related Works
Problem Setup
Selective Mixup for Optimizing Non-Decomposable Objectives
Theoretical Analysis of SelMix
Experiments
Conclusion and Discussion
Notation
Computational Complexity
Additional Theoretical Results and Proofs omitted in the Paper
Convergence Analysis
A Formal Statement of Theorem \ref{['thm:gain-approx']} and Remarks on Non-differentiability
Proof of Theorem \ref{['thm:approx-formula']}
Validity of the Mixup Sampling Distribution
A Variant of Theorem \ref{['thm:regbound']}
...and 18 more sections

Key Result

Theorem 4.1

Assume that $\| V_{ij}\|$ is sufficiently small. Then, the gain for the $(i,j)$ mixup ($G_{ij}$) can be approximated using the following expression: where $z_k = \mathbb{E}_{x \sim D^{\mathrm{val}}_{k}}[g(x)]$ is the mean of the features of the validation samples belonging to class $k$, used to characterize (i,j) mixups (Eq. eq:mixup-loss-centroid). The error term $\varepsilon(\widetilde{C}, W)$

Figures (7)

Figure 1: Overview of Results on CIFAR-10 LT (Semi-supervised). We evaluate the models from SotA Semi-supervised techniques of DASO oh2022daso, ABC lee2021abc, CSST rangwani2022costsensitive and proposed SelMix on different non-decomposable objectives. We find that SelMix produces the best performance for the non-decomposable metric and constraints it is optimized for (blue). Further, SelMix is an inexpensive fine-tuning technique compared to other expensive full pre-training-based baselines.
Figure 2: We demonstrate the effect of the variants of mixup on feature representations (a). With Mixup, the feature representation gets equal contribution in all directions of other classes (b). Unlike this, in SelMix (c), certain class mixups are selected at a timestep $t$ such that they optimize the desired metric. Below is an overview of how the SelMix distribution is obtained at timestep $t$.
Figure 3: Comparison of metric for semi-supervised CIFAR-10 LT under $\rho_l \neq \rho_u$ and STL-10 $\rho_u = NA$ assumption. For CIFAR-10-LT (semi-supervised) involve $\rho_l = 100, \rho_u = 1$, (uniform) and $\rho_l = 100, \rho_u = \frac{1}{100}$ (inverted). SelMix achieves significant gains over other baselines.
Figure : Training through SelMix
Figure J.1: Evolution of gain matrix for mean recall optimized run for CIFAR-10 LT ($\rho_l = \rho_u$).
...and 2 more figures

Theorems & Definitions (21)

Theorem 4.1
Theorem 4.2
Theorem 4.3: Informal
Proposition B.1
proof
Theorem C.2
proof
Theorem C.3
proof
Lemma C.4
...and 11 more

Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives

TL;DR

Abstract

Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (21)