Table of Contents
Fetching ...

SAGE: Streaming Agreement-Driven Gradient Sketches for Representative Subset Selection

Ashish Jha, Salman Ahmadi-Asl

TL;DR

SAGE tackles the high cost of training on large datasets by streaming a Frequent Directions sketch to capture the dominant gradient rowspace and using agreement-based scoring in the sketched subspace to select representative subsets. The method achieves a low, constant memory footprint O(ell D) and a simple two-pass pipeline, with energy-preservation guarantees that tie subset updates to the full gradient. A class-balanced variant further improves label coverage in imbalanced regimes. Empirically, SAGE retains or surpasses full-data accuracy at 25% data on multiple benchmarks and delivers 3–6x speedups, making it a practical tool for efficient training that complements pruning and compression techniques.

Abstract

Training modern neural networks on large datasets is computationally and energy intensive. We present SAGE, a streaming data-subset selection method that maintains a compact Frequent Directions (FD) sketch of gradient geometry in $O(\ell D)$ memory and prioritizes examples whose sketched gradients align with a consensus direction. The approach eliminates $N \times N$ pairwise similarities and explicit $N \times \ell$ gradient stores, yielding a simple two-pass, GPU-friendly pipeline. Leveraging FD's deterministic approximation guarantees, we analyze how agreement scoring preserves gradient energy within the principal sketched subspace. Across multiple benchmarks, SAGE trains with small kept-rate budgets while retaining competitive accuracy relative to full-data training and recent subset-selection baselines, and reduces end-to-end compute and peak memory. Overall, SAGE offers a practical, constant-memory alternative that complements pruning and model compression for efficient training.

SAGE: Streaming Agreement-Driven Gradient Sketches for Representative Subset Selection

TL;DR

SAGE tackles the high cost of training on large datasets by streaming a Frequent Directions sketch to capture the dominant gradient rowspace and using agreement-based scoring in the sketched subspace to select representative subsets. The method achieves a low, constant memory footprint O(ell D) and a simple two-pass pipeline, with energy-preservation guarantees that tie subset updates to the full gradient. A class-balanced variant further improves label coverage in imbalanced regimes. Empirically, SAGE retains or surpasses full-data accuracy at 25% data on multiple benchmarks and delivers 3–6x speedups, making it a practical tool for efficient training that complements pruning and compression techniques.

Abstract

Training modern neural networks on large datasets is computationally and energy intensive. We present SAGE, a streaming data-subset selection method that maintains a compact Frequent Directions (FD) sketch of gradient geometry in memory and prioritizes examples whose sketched gradients align with a consensus direction. The approach eliminates pairwise similarities and explicit gradient stores, yielding a simple two-pass, GPU-friendly pipeline. Leveraging FD's deterministic approximation guarantees, we analyze how agreement scoring preserves gradient energy within the principal sketched subspace. Across multiple benchmarks, SAGE trains with small kept-rate budgets while retaining competitive accuracy relative to full-data training and recent subset-selection baselines, and reduces end-to-end compute and peak memory. Overall, SAGE offers a practical, constant-memory alternative that complements pruning and model compression for efficient training.

Paper Structure

This paper contains 10 sections, 1 theorem, 4 equations, 1 figure, 1 table, 1 algorithm.

Key Result

Lemma 1

Let $T\subseteq [N]$ with $|T|=k$ and assume $\alpha_i \ge \xi>0$ for all $i\in T$ and $\|\bar{z}\|_2>0$. Then

Figures (1)

  • Figure 1: Figure 1: Relative test accuracy vs. training speed-up across CIFAR-10, CIFAR-100, Fashion-MNIST, TinyImageNet, and Caltech-256 at subset fractions of 5%, 15%, 25%, and 100%. SAGE achieves superior accuracy retention at aggressive subset fractions, matching or exceeding full-data accuracy at 25% data usage while delivering 3-6× training speed-ups. Curves show exponential fits with R² quality indicators, and shaded regions indicate variability across three independent seeds.

Theorems & Definitions (1)

  • Lemma 1: Consensus-direction energy