SAGE: Streaming Agreement-Driven Gradient Sketches for Representative Subset Selection

Ashish Jha; Salman Ahmadi-Asl

SAGE: Streaming Agreement-Driven Gradient Sketches for Representative Subset Selection

Ashish Jha, Salman Ahmadi-Asl

TL;DR

SAGE tackles the high cost of training on large datasets by streaming a Frequent Directions sketch to capture the dominant gradient rowspace and using agreement-based scoring in the sketched subspace to select representative subsets. The method achieves a low, constant memory footprint O(ell D) and a simple two-pass pipeline, with energy-preservation guarantees that tie subset updates to the full gradient. A class-balanced variant further improves label coverage in imbalanced regimes. Empirically, SAGE retains or surpasses full-data accuracy at 25% data on multiple benchmarks and delivers 3–6x speedups, making it a practical tool for efficient training that complements pruning and compression techniques.

Abstract

Training modern neural networks on large datasets is computationally and energy intensive. We present SAGE, a streaming data-subset selection method that maintains a compact Frequent Directions (FD) sketch of gradient geometry in $O(\ell D)$ memory and prioritizes examples whose sketched gradients align with a consensus direction. The approach eliminates $N \times N$ pairwise similarities and explicit $N \times \ell$ gradient stores, yielding a simple two-pass, GPU-friendly pipeline. Leveraging FD's deterministic approximation guarantees, we analyze how agreement scoring preserves gradient energy within the principal sketched subspace. Across multiple benchmarks, SAGE trains with small kept-rate budgets while retaining competitive accuracy relative to full-data training and recent subset-selection baselines, and reduces end-to-end compute and peak memory. Overall, SAGE offers a practical, constant-memory alternative that complements pruning and model compression for efficient training.

SAGE: Streaming Agreement-Driven Gradient Sketches for Representative Subset Selection

TL;DR

Abstract

memory and prioritizes examples whose sketched gradients align with a consensus direction. The approach eliminates

pairwise similarities and explicit

gradient stores, yielding a simple two-pass, GPU-friendly pipeline. Leveraging FD's deterministic approximation guarantees, we analyze how agreement scoring preserves gradient energy within the principal sketched subspace. Across multiple benchmarks, SAGE trains with small kept-rate budgets while retaining competitive accuracy relative to full-data training and recent subset-selection baselines, and reduces end-to-end compute and peak memory. Overall, SAGE offers a practical, constant-memory alternative that complements pruning and model compression for efficient training.

SAGE: Streaming Agreement-Driven Gradient Sketches for Representative Subset Selection

TL;DR

Abstract

SAGE: Streaming Agreement-Driven Gradient Sketches for Representative Subset Selection

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (1)