Joker: Joint Optimization Framework for Lightweight Kernel Machines
Junhong Zhang, Zhihui Lai
TL;DR
Joker proposes a unified, memory-efficient framework for large-scale kernel machines by formulating a dual optimization problem that accommodates a broad class of convex losses and by solving it with a Dual Block Coordinate Descent method enhanced with a Trust Region. To tackle memory bottlenecks, Joker employs Random Fourier Features for inexact kernel representations, reducing per-iteration cost while maintaining competitive accuracy. The approach yields substantial memory savings (up to ~90-95% in reported settings) and favorable training times compared with state-of-the-art baselines, across KRR, SVM, and KLR tasks on billion-scale datasets. This enables practical deployment of lightweight kernel methods on commodity hardware without sacrificing model diversity or performance.
Abstract
Kernel methods are powerful tools for nonlinear learning with well-established theory. The scalability issue has been their long-standing challenge. Despite the existing success, there are two limitations in large-scale kernel methods: (i) The memory overhead is too high for users to afford; (ii) existing efforts mainly focus on kernel ridge regression (KRR), while other models lack study. In this paper, we propose Joker, a joint optimization framework for diverse kernel models, including KRR, logistic regression, and support vector machines. We design a dual block coordinate descent method with trust region (DBCD-TR) and adopt kernel approximation with randomized features, leading to low memory costs and high efficiency in large-scale learning. Experiments show that Joker saves up to 90\% memory but achieves comparable training time and performance (or even better) than the state-of-the-art methods.
