Fast training of large kernel models with delayed projections

Amirhesam Abedsoltan; Siyuan Ma; Parthe Pandit; Mikhail Belkin

Fast training of large kernel models with delayed projections

Amirhesam Abedsoltan, Siyuan Ma, Parthe Pandit, Mikhail Belkin

TL;DR

This paper presents a new methodology for building kernel machines that can scale efficiently with both data size and model size and introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD), allowing the training of much larger models than was previously feasible.

Abstract

Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes--a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible, pushing the practical limits of kernel-based learning. We validate our algorithm, EigenPro4, across multiple datasets, demonstrating drastic training speed up over the existing methods while maintaining comparable or better classification accuracy.

Fast training of large kernel models with delayed projections

TL;DR

Abstract

Fast training of large kernel models with delayed projections

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)