Table of Contents
Fetching ...

Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition

Minxue Tang, Yangyang Yu, Aolin Ding, Maziyar Baran Pouyan, Taha Belkhouja Yujia Bao

TL;DR

ADAMAB trains embedder-agnostic light-weight calibrators on top of fixed embedding models without accessing their parameters without accessing their parameters to maximally reduce the computational costs and mitigate the need for large-scale training data.

Abstract

Recognizing implicit visual and textual patterns is essential in many real-world applications of modern AI. However, tackling long-tail pattern recognition tasks remains challenging for current pre-trained foundation models such as LLMs and VLMs. While finetuning pre-trained models can improve accuracy in recognizing implicit patterns, it is usually infeasible due to a lack of training data and high computational overhead. In this paper, we propose ADAMAB, an efficient embedding calibration framework for few-shot pattern recognition. To maximally reduce the computational costs, ADAMAB trains embedder-agnostic light-weight calibrators on top of fixed embedding models without accessing their parameters. To mitigate the need for large-scale training data, we introduce an adaptive data augmentation strategy based on the Multi-Armed Bandit (MAB) mechanism. With a modified upper confidence bound algorithm, ADAMAB diminishes the gradient shifting and offers theoretically guaranteed convergence in few-shot training. Our multi-modal experiments justify the superior performance of ADAMAB, with up to 40% accuracy improvement when training with less than 5 initial data samples of each class.

Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition

TL;DR

ADAMAB trains embedder-agnostic light-weight calibrators on top of fixed embedding models without accessing their parameters without accessing their parameters to maximally reduce the computational costs and mitigate the need for large-scale training data.

Abstract

Recognizing implicit visual and textual patterns is essential in many real-world applications of modern AI. However, tackling long-tail pattern recognition tasks remains challenging for current pre-trained foundation models such as LLMs and VLMs. While finetuning pre-trained models can improve accuracy in recognizing implicit patterns, it is usually infeasible due to a lack of training data and high computational overhead. In this paper, we propose ADAMAB, an efficient embedding calibration framework for few-shot pattern recognition. To maximally reduce the computational costs, ADAMAB trains embedder-agnostic light-weight calibrators on top of fixed embedding models without accessing their parameters. To mitigate the need for large-scale training data, we introduce an adaptive data augmentation strategy based on the Multi-Armed Bandit (MAB) mechanism. With a modified upper confidence bound algorithm, ADAMAB diminishes the gradient shifting and offers theoretically guaranteed convergence in few-shot training. Our multi-modal experiments justify the superior performance of ADAMAB, with up to 40% accuracy improvement when training with less than 5 initial data samples of each class.
Paper Structure (26 sections, 5 theorems, 51 equations, 4 figures, 4 tables)

This paper contains 26 sections, 5 theorems, 51 equations, 4 figures, 4 tables.

Key Result

Theorem 1

A gradient descent algorithm with ${\bm{w}}_{t+1} = {\bm{w}}_t-\eta_t {\bm{g}}_t$ as the update rule can achieve the following convergence rate with ass:smooth_body and learning rate $\eta_t\le 1/\beta$:

Figures (4)

  • Figure 1: Illustration of light-weight neural similarity networks.
  • Figure 2: The framework of ADAMAB. ADAMAB can be applied to both visual and textual pattern recognition tasks. To calibrate a pre-trained embedder (e.g., CLIP), we train a light-weight neural similarity network with a detailed structure illustrated in \ref{['fig:network']}. We select a class with a Multi-armed Bandit, and augment the samples of this class with another pre-trained generator (e.g., diffusion models and GPT for image and text generation respectively). We alternatively augment the training data and train the model until convergence.
  • Figure 3: Calibration Accuracy with respect to the average number of training samples per class.
  • Figure 4: Calibration Accuracy with respect to the exploration hyperparameter $\alpha$.

Theorems & Definitions (9)

  • Theorem 1: Convergence of Biased Gradient Descent
  • Theorem 2: Convergence of ADAMAB
  • Theorem 1
  • proof
  • Lemma 1: Vector Hoeffding's Inequality
  • proof
  • Theorem 2: Convergence of ADAMAB
  • proof
  • Remark 1