Table of Contents
Fetching ...

Swift Sampler: Efficient Learning of Sampler by 10 Parameters

Jiawei Yao, Chuming Li, Canran Xiao

TL;DR

The paper tackles data sampling for efficient and effective DL training by pruning the need for expensive, task-specific trial-and-error. It introduces Swift Sampler (SS), a bilevel framework that maps data features to a Lipschitz-constrained, low-dimensional sampler using only 10 hyper-parameters, aided by a smoothing transform and a fast local-minima approximation. The outer loop employs Bayesian Optimization to search the sampler, while the inner loop uses a shared initialization to rapidly approximate the optimal network weights under a given sampler. Empirical results across CIFAR, ImageNet, and MS1M demonstrate consistent performance gains, transferability across architectures, and notable efficiency advantages over prior automatic sampler methods. This approach enables scalable, data-aware sampling strategies that improve both convergence and final accuracy on large-scale datasets.

Abstract

Data selection is essential for training deep learning models. An effective data sampler assigns proper sampling probability for training data and helps the model converge to a good local minimum with high performance. Previous studies in data sampling are mainly based on heuristic rules or learning through a huge amount of time-consuming trials. In this paper, we propose an automatic \textbf{swift sampler} search algorithm, \textbf{SS}, to explore automatically learning effective samplers efficiently. In particular, \textbf{SS} utilizes a novel formulation to map a sampler to a low dimension of hyper-parameters and uses an approximated local minimum to quickly examine the quality of a sampler. Benefiting from its low computational expense, \textbf{SS} can be applied on large-scale data sets with high efficiency. Comprehensive experiments on various tasks demonstrate that \textbf{SS} powered sampling can achieve obvious improvements (e.g., 1.5\% on ImageNet) and transfer among different neural networks. Project page: https://github.com/Alexander-Yao/Swift-Sampler.

Swift Sampler: Efficient Learning of Sampler by 10 Parameters

TL;DR

The paper tackles data sampling for efficient and effective DL training by pruning the need for expensive, task-specific trial-and-error. It introduces Swift Sampler (SS), a bilevel framework that maps data features to a Lipschitz-constrained, low-dimensional sampler using only 10 hyper-parameters, aided by a smoothing transform and a fast local-minima approximation. The outer loop employs Bayesian Optimization to search the sampler, while the inner loop uses a shared initialization to rapidly approximate the optimal network weights under a given sampler. Empirical results across CIFAR, ImageNet, and MS1M demonstrate consistent performance gains, transferability across architectures, and notable efficiency advantages over prior automatic sampler methods. This approach enables scalable, data-aware sampling strategies that improve both convergence and final accuracy on large-scale datasets.

Abstract

Data selection is essential for training deep learning models. An effective data sampler assigns proper sampling probability for training data and helps the model converge to a good local minimum with high performance. Previous studies in data sampling are mainly based on heuristic rules or learning through a huge amount of time-consuming trials. In this paper, we propose an automatic \textbf{swift sampler} search algorithm, \textbf{SS}, to explore automatically learning effective samplers efficiently. In particular, \textbf{SS} utilizes a novel formulation to map a sampler to a low dimension of hyper-parameters and uses an approximated local minimum to quickly examine the quality of a sampler. Benefiting from its low computational expense, \textbf{SS} can be applied on large-scale data sets with high efficiency. Comprehensive experiments on various tasks demonstrate that \textbf{SS} powered sampling can achieve obvious improvements (e.g., 1.5\% on ImageNet) and transfer among different neural networks. Project page: https://github.com/Alexander-Yao/Swift-Sampler.
Paper Structure (30 sections, 17 equations, 3 figures, 11 tables, 1 algorithm)

This paper contains 30 sections, 17 equations, 3 figures, 11 tables, 1 algorithm.

Figures (3)

  • Figure 1: A demonstration of the effectiveness of our SS. (a) The density of noisy instances of noise 40% on CIFAR10 in (Loss,$E^r$) space. (b) The sampling probability of sampler from SS. (a)(b) show that SS accurately distinguishes the noisy instances and discards them.
  • Figure 2: Visualization of the sampler searched on ImageNet ILSVRC12: (a) The cropped images (in yellow boxes) with the least sampling probability in the sampler from SS. Most of them are in inappropriate positions and contain irrelevant objects. (b) The sampling probability of sampler from SS.
  • Figure 3: Verification of the efficiency of BO and the effectiveness of $cgf$ in smoothing the OF. On ImageNet ILSVRC12, SS($cdf$) outperforms RL as its estimation of the whole landscape of OF. SS($cgf$) optimize faster than SS($cdf$) as it smooths the OF.