The Galerkin method beats Graph-Based Approaches for Spectral Algorithms
Vivien Cabannes, Francis Bach
TL;DR
This work introduces a Galerkin framework for spectral decompositions of a broad class of operators, offering statistical and computational advantages over graph-based approaches. By restricting attention to a finite set of test functions and formulating a GSVD-based recovery of spectral components, it unifies kernel-based methods with random features and Nyström-type ideas, and scales favorably with data via $O(n p^2 c_H + p^3)$ complexity. The Laplacian example demonstrates both strong theoretical guarantees and practical efficiency, including implementations that exploit kernel structure to achieve $O(n p^2 + n p d)$ flops. Beyond linear operators, the paper discusses loss-based optimization to extend spectral learning to non-linear function spaces, connecting spectral methods to self-supervised learning and deep representations. Overall, the Galerkin approach advances scalable, principled spectral analysis with broad applicability to clustering, embeddings, and diffusion-inspired models, complemented by a public software library.
Abstract
Historically, the machine learning community has derived spectral decompositions from graph-based approaches. We break with this approach and prove the statistical and computational superiority of the Galerkin method, which consists in restricting the study to a small set of test functions. In particular, we introduce implementation tricks to deal with differential operators in large dimensions with structured kernels. Finally, we extend on the core principles beyond our approach to apply them to non-linear spaces of functions, such as the ones parameterized by deep neural networks, through loss-based optimization procedures.
