Sketchy Moment Matching: Toward Fast and Provable Data Selection for Finetuning
Yijun Dong, Hoang Phan, Xiang Pan, Qi Lei
TL;DR
Sketchy Moment Matching (SkMM) addresses data selection for finetuning in high-dimensional models by harnessing a variance-bias tradeoff driven by a low intrinsic dimension. The method first uses gradient sketching to identify a compact subspace ${\mathcal{S}}$ that captures the essential finetuning directions, then performs moment matching within this subspace to control variance, achieving a fast-rate generalization ${O(\dim({\mathcal{S}})/n)}$. Theoretical results show gradient sketching provably yields a low-bias subspace and preserves fast-rate learning, while a practical quadratic-programming relaxation enables scalable moment matching in the reduced space. Empirical results on synthetic data, regression, and image-classification tasks demonstrate SkMM’s advantage in low-data regimes, with robustness to data imbalances and strong performance relative to standard baselines.
Abstract
We revisit data selection in a modern context of finetuning from a fundamental perspective. Extending the classical wisdom of variance minimization in low dimensions to high-dimensional finetuning, our generalization analysis unveils the importance of additionally reducing bias induced by low-rank approximation. Inspired by the variance-bias tradeoff in high dimensions from the theory, we introduce Sketchy Moment Matching (SkMM), a scalable data selection scheme with two stages. (i) First, the bias is controlled using gradient sketching that explores the finetuning parameter space for an informative low-dimensional subspace $\mathcal{S}$; (ii) then the variance is reduced over $\mathcal{S}$ via moment matching between the original and selected datasets. Theoretically, we show that gradient sketching is fast and provably accurate: selecting $n$ samples by reducing variance over $\mathcal{S}$ preserves the fast-rate generalization $O(\dim(\mathcal{S})/n)$, independent of the parameter dimension. Empirically, we concretize the variance-bias balance via synthetic experiments and demonstrate the effectiveness of SkMM for finetuning in real vision tasks.
