Stability-based Generalization Analysis of Randomized Coordinate Descent for Pairwise Learning
Liang Wu, Ruixi Hu, Yunwen Lei
TL;DR
This work analyzes the generalization of randomized coordinate descent (RCD) in pairwise learning by developing an on-average argument stability framework. It derives convex- and strongly convex-case excess-risk bounds, showing that early stopping helps balance estimation and optimization, with rates improving under a low-noise condition $F(w^*) = O(1/n)$. Specifically, the convex case achieves $O(1/\sqrt{n})$ generalization (and $O(1/n)$ under low noise), while the strongly convex case attains $O(\sqrt{\log(n)}/n)$ and near-optimal $O(1/n)$ rates when the iteration budget scales as $T \asymp \log(n)$. Experiments on AUC maximization validate the theory, demonstrating that RCD provides greater stability than SGD on LIBSVM datasets and that stability-guided early stopping yields improved generalization in practice.
Abstract
Pairwise learning includes various machine learning tasks, with ranking and metric learning serving as the primary representatives. While randomized coordinate descent (RCD) is popular in various learning problems, there is much less theoretical analysis on the generalization behavior of models trained by RCD, especially under the pairwise learning framework. In this paper, we consider the generalization of RCD for pairwise learning. We measure the on-average argument stability for both convex and strongly convex objective functions, based on which we develop generalization bounds in expectation. The early-stopping strategy is adopted to quantify the balance between estimation and optimization. Our analysis further incorporates the low-noise setting into the excess risk bound to achieve the optimistic bound as $O(1/n)$, where $n$ is the sample size.
