Table of Contents
Fetching ...

Structured adaptive and random spinners for fast machine learning computations

Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Francois Fagan, Cedric Gouy-Pailler, Anne Morvan, Nourhan Sakr, Tamas Sarlos, Jamal Atif

TL;DR

This work introduces Structured Spinners, a universal, fast projection framework that replaces dense Gaussian projections with a product of three structured blocks $G_{struct}=M_3M_2M_1$, enabling large speedups with minimal accuracy loss. The framework unifies and extends prior structured transforms, supports either learned or random blocks, and extends to nonlinear mappings, with broad applications in kernel methods, LSH, Newton sketches, and neural networks. The authors provide rigorous theory showing that structured projections closely approximate unstructured ones in distribution under random and adaptive settings, including first guarantees for the fast cross-polytope LSH using $HD_3HD_2HD_1$, and demonstrate practical effectiveness via extensive experiments. Overall, Structured Spinners offer significant computational savings and memory efficiency while preserving accuracy, enabling faster, scalable ML across diverse tasks and architectures.

Abstract

We consider an efficient computational framework for speeding up several machine learning algorithms with almost no loss of accuracy. The proposed framework relies on projections via structured matrices that we call Structured Spinners, which are formed as products of three structured matrix-blocks that incorporate rotations. The approach is highly generic, i.e. i) structured matrices under consideration can either be fully-randomized or learned, ii) our structured family contains as special cases all previously considered structured schemes, iii) the setting extends to the non-linear case where the projections are followed by non-linear functions, and iv) the method finds numerous applications including kernel approximations via random feature maps, dimensionality reduction algorithms, new fast cross-polytope LSH techniques, deep learning, convex optimization algorithms via Newton sketches, quantization with random projection trees, and more. The proposed framework comes with theoretical guarantees characterizing the capacity of the structured model in reference to its unstructured counterpart and is based on a general theoretical principle that we describe in the paper. As a consequence of our theoretical analysis, we provide the first theoretical guarantees for one of the most efficient existing LSH algorithms based on the HD3HD2HD1 structured matrix [Andoni et al., 2015]. The exhaustive experimental evaluation confirms the accuracy and efficiency of structured spinners for a variety of different applications.

Structured adaptive and random spinners for fast machine learning computations

TL;DR

This work introduces Structured Spinners, a universal, fast projection framework that replaces dense Gaussian projections with a product of three structured blocks , enabling large speedups with minimal accuracy loss. The framework unifies and extends prior structured transforms, supports either learned or random blocks, and extends to nonlinear mappings, with broad applications in kernel methods, LSH, Newton sketches, and neural networks. The authors provide rigorous theory showing that structured projections closely approximate unstructured ones in distribution under random and adaptive settings, including first guarantees for the fast cross-polytope LSH using , and demonstrate practical effectiveness via extensive experiments. Overall, Structured Spinners offer significant computational savings and memory efficiency while preserving accuracy, enabling faster, scalable ML across diverse tasks and architectures.

Abstract

We consider an efficient computational framework for speeding up several machine learning algorithms with almost no loss of accuracy. The proposed framework relies on projections via structured matrices that we call Structured Spinners, which are formed as products of three structured matrix-blocks that incorporate rotations. The approach is highly generic, i.e. i) structured matrices under consideration can either be fully-randomized or learned, ii) our structured family contains as special cases all previously considered structured schemes, iii) the setting extends to the non-linear case where the projections are followed by non-linear functions, and iv) the method finds numerous applications including kernel approximations via random feature maps, dimensionality reduction algorithms, new fast cross-polytope LSH techniques, deep learning, convex optimization algorithms via Newton sketches, quantization with random projection trees, and more. The proposed framework comes with theoretical guarantees characterizing the capacity of the structured model in reference to its unstructured counterpart and is based on a general theoretical principle that we describe in the paper. As a consequence of our theoretical analysis, we provide the first theoretical guarantees for one of the most efficient existing LSH algorithms based on the HD3HD2HD1 structured matrix [Andoni et al., 2015]. The exhaustive experimental evaluation confirms the accuracy and efficiency of structured spinners for a variety of different applications.

Paper Structure

This paper contains 30 sections, 9 theorems, 55 equations, 6 figures, 2 tables.

Key Result

Lemma 1

The following matrices: $\textbf{G}_{circ}\textbf{D}_{2}\textbf{HD}_{1}$, $\sqrt{n}\textbf{HD}_{3}\textbf{HD}_{2}\textbf{HD}_{1}$ and $\sqrt{n}\textbf{HD}_{g_{1},...,g_{n}}\textbf{HD}_{2}\textbf{HD}_{1}$, where $\textbf{G}_{circ}$ is Gaussian circulant, are valid structured spinners for $\delta(n) =

Figures (6)

  • Figure 1: Pictorial explanation of the role of three matrix-blocks in the construction of the structured spinner. Left picture: $\textbf{M}_{1}$ rotates $\textbf{v}$ such that the rotated version $\textbf{v}_{r}$ is balanced. Middle picture: $\textbf{M}_{2}$ transforms vectors $\textbf{v},\textbf{w},\textbf{u}$ such that their images $\textbf{v}_{r},\textbf{w}_{r},\textbf{u}_{r}$ are near-orthogonal. Right picture: The projections of the random vector $\textbf{r}$ onto such two near-orthogonal vectors $\textbf{v}$, w are near-independent.
  • Figure 2: Cross-polytope LSH - collision probabilities. (bottom) A zoom on higher distances enables to distinguish the curves which are almost superposed.
  • Figure 3: Accuracy of random feature map kernel approximation for the G50C dataset.
  • Figure 4: Test error for MLP (top) and convolutional network (bottom).
  • Figure 5: Accuracy of random feature map kernel approximation for the USPST dataset.
  • ...and 1 more figures

Theorems & Definitions (18)

  • Definition 1: $(\delta(n),p(n))$-balanced matrices
  • Remark 1
  • Definition 2: $(\Delta_{F},\Delta_{2})$-smooth sets
  • Remark 2
  • Lemma 1
  • Remark 3
  • Remark 4
  • Definition 3
  • Theorem 1: structured random setting
  • Remark 5
  • ...and 8 more