A Smoothing Algorithm for l1 Support Vector Machines
Ibrahim Emirahmetoglu, Jeffrey Hajewski, Suely Oliveira, David E. Stewart
TL;DR
The paper introduces SmSVM, a method for efficiently solving soft-margin SVMs with an ℓ¹ penalty on very large datasets by smoothing the hinge loss and employing an active-set strategy for the ℓ¹ term. The approach yields well-behaved Hessian approximations and a Newton-based solver with guarded line search, achieving a provably bounded number of Newton steps per smoothing parameter reduction and overall data passes that scale polylogarithmically with α. Theoretical analysis across opening, midgame, and endgame regimes provides convergence guarantees and practical guidance, while extensive experiments on real and synthetic data demonstrate competitive test accuracy and favorable training times, especially in tall and large-scale settings. The work shows how combining smoothing, active-set sparsity, and second-order optimization can robustly handle large-scale, sparse SVMs with strong empirical performance. It highlights potential for further speedups via GPU acceleration and distributed computing, making ℓ¹ SVMs more viable for industrial-scale data.
Abstract
A smoothing algorithm is presented for solving the soft-margin Support Vector Machine (SVM) optimization problem with an $\ell^{1}$ penalty. This algorithm is designed to require a modest number of passes over the data, which is an important measure of its cost for very large datasets. The algorithm uses smoothing for the hinge-loss function, and an active set approach for the $\ell^{1}$ penalty. The smoothing parameter $α$ is initially large, but typically halved when the smoothed problem is solved to sufficient accuracy. Convergence theory is presented that shows $\mathcal{O}(1+\log(1+\log_+(1/α)))$ guarded Newton steps for each value of $α$ except for asymptotic bands $α=Θ(1)$ and $α=Θ(1/N)$, with only one Newton step provided $ηα\gg1/N$, where $N$ is the number of data points and the stopping criterion that the predicted reduction is less than $ηα$. The experimental results show that our algorithm is capable of strong test accuracy without sacrificing training speed.
