Diversified Batch Selection for Training Acceleration

Feng Hong; Yueming Lyu; Jiangchao Yao; Ya Zhang; Ivor W. Tsang; Yanfeng Wang

Diversified Batch Selection for Training Acceleration

Feng Hong, Yueming Lyu, Jiangchao Yao, Ya Zhang, Ivor W. Tsang, Yanfeng Wang

TL;DR

Diversified Batch Selection (DivBS) tackles the inefficiency of online batch selection by optimizing the subset-level orthogonalized representativeness, rather than scoring samples individually. The method is fully reference-model-free and uses a greedy, approximately $1-e^{-1}$-competitive algorithm with a practical simplification that reduces computational overhead. The key contributions are the formal objective $r(S,B,\theta)$, the theoretical guarantees for the greedy algorithm via $r'(S,B,\theta)$, and extensive cross-domain experiments showing DivBS delivers notable training speedups with minimal accuracy loss across image classification, segmentation, cross-modal retrieval, and language-model fine-tuning. This approach enables scalable, diverse data selection for accelerated training in large-scale settings without requiring auxiliary reference models.

Abstract

The remarkable success of modern machine learning models on large datasets often demands extensive training time and resource consumption. To save cost, a prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. Although recent efforts achieve advancements by measuring the impact of each sample on generalization, their reliance on additional reference models inherently limits their practical applications, when there are no such ideal models available. On the other hand, the vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner, which sacrifices the diversity and induces the redundancy. To tackle this dilemma, we propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples. Specifically, we define a novel selection objective that measures the group-wise orthogonalized representativeness to combat the redundancy issue of previous sample-wise criteria, and provide a principled selection-efficient realization. Extensive experiments across various tasks demonstrate the significant superiority of DivBS in the performance-speedup trade-off. The code is publicly available.

Diversified Batch Selection for Training Acceleration

TL;DR

-competitive algorithm with a practical simplification that reduces computational overhead. The key contributions are the formal objective

, the theoretical guarantees for the greedy algorithm via

, and extensive cross-domain experiments showing DivBS delivers notable training speedups with minimal accuracy loss across image classification, segmentation, cross-modal retrieval, and language-model fine-tuning. This approach enables scalable, diverse data selection for accelerated training in large-scale settings without requiring auxiliary reference models.

Abstract

Paper Structure (33 sections, 5 theorems, 21 equations, 4 figures, 13 tables, 2 algorithms)

This paper contains 33 sections, 5 theorems, 21 equations, 4 figures, 13 tables, 2 algorithms.

Introduction
Background: Online Batch Selection
Method: Diversified Batch Selection
Motivation
Objective
Optimization
Realization
Discussion
Experiments
Experimental Setup
Performance Evaluation on Image Classification
Performance Evaluation under Class Imbalance
Performance Evaluation on Semantic Segmentation
Performance Evaluation on Cross-Modal Retrieval
Performance Evaluation on LM Finetuning
...and 18 more sections

Key Result

Proposition 3.1

$\forall E \in {\mathcal{E}}(g(S,\theta))$, we have

Figures (4)

Figure 1: Visualization of a toy motivating example, which is a 2D imbalanced four-class classification problem. Subfigure (a) represents all the training data. Subfigures (b), (c), and (d) depict the subsets selected by the Uniform, Train Loss, and DivBS methods, with 10% budget. For more details, please refer to \ref{['appendix:toy']}.
Figure 2: (a) Average mean feature cosine distance with the k-nearest neighbors for the selected data on CIFAR-10 (10% budget). (b) Properties of 10 groups in the selected data on CIFAR-100-LT. (c) Performance comparison on CIFAR-10* and CIFAR-100*. Note that Beyesian and RHO-LOSS requring reference models and also introduce additional overhead from using auxiliary models for inference.
Figure 3: Performance ($\uparrow$) v.s. speedup ($\uparrow$) on (a) CIFAR-10, (b) CIFAR-100, and (c) PASCAL VOC 2012 trainaug. The upper panel displays the relationship between the performance (accuracy or mIoU) of different methods and the speedup w.r.t. the number of training iterations. The lower panel illustrates the relationship between the performance and the speedup w.r.t. the wall-clock time.
Figure 4: T-SNE visualization of data selected by different methods on CIFAR-10 with 10% budget. Circles highlight redundant samples.

Theorems & Definitions (10)

Proposition 3.1
Proposition 3.2
Proposition 3.3
proof : Proof of \ref{['prop:basis']}
Definition 2.1: Proposition 2.3 in DBLP:journals/ftml/Bach13
proof : Proof of \ref{["prop:r'"]}
Lemma 2.2: DBLP:journals/mp/NemhauserWF78
Lemma 2.3
proof : Proof of \ref{["lemma:r' ratio"]}
proof : Proof of \ref{['prop:r guarantee']}

Diversified Batch Selection for Training Acceleration

TL;DR

Abstract

Diversified Batch Selection for Training Acceleration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (10)