Table of Contents
Fetching ...

Stability-based Generalization Analysis of Randomized Coordinate Descent for Pairwise Learning

Liang Wu, Ruixi Hu, Yunwen Lei

TL;DR

This work analyzes the generalization of randomized coordinate descent (RCD) in pairwise learning by developing an on-average argument stability framework. It derives convex- and strongly convex-case excess-risk bounds, showing that early stopping helps balance estimation and optimization, with rates improving under a low-noise condition $F(w^*) = O(1/n)$. Specifically, the convex case achieves $O(1/\sqrt{n})$ generalization (and $O(1/n)$ under low noise), while the strongly convex case attains $O(\sqrt{\log(n)}/n)$ and near-optimal $O(1/n)$ rates when the iteration budget scales as $T \asymp \log(n)$. Experiments on AUC maximization validate the theory, demonstrating that RCD provides greater stability than SGD on LIBSVM datasets and that stability-guided early stopping yields improved generalization in practice.

Abstract

Pairwise learning includes various machine learning tasks, with ranking and metric learning serving as the primary representatives. While randomized coordinate descent (RCD) is popular in various learning problems, there is much less theoretical analysis on the generalization behavior of models trained by RCD, especially under the pairwise learning framework. In this paper, we consider the generalization of RCD for pairwise learning. We measure the on-average argument stability for both convex and strongly convex objective functions, based on which we develop generalization bounds in expectation. The early-stopping strategy is adopted to quantify the balance between estimation and optimization. Our analysis further incorporates the low-noise setting into the excess risk bound to achieve the optimistic bound as $O(1/n)$, where $n$ is the sample size.

Stability-based Generalization Analysis of Randomized Coordinate Descent for Pairwise Learning

TL;DR

This work analyzes the generalization of randomized coordinate descent (RCD) in pairwise learning by developing an on-average argument stability framework. It derives convex- and strongly convex-case excess-risk bounds, showing that early stopping helps balance estimation and optimization, with rates improving under a low-noise condition . Specifically, the convex case achieves generalization (and under low noise), while the strongly convex case attains and near-optimal rates when the iteration budget scales as . Experiments on AUC maximization validate the theory, demonstrating that RCD provides greater stability than SGD on LIBSVM datasets and that stability-guided early stopping yields improved generalization in practice.

Abstract

Pairwise learning includes various machine learning tasks, with ranking and metric learning serving as the primary representatives. While randomized coordinate descent (RCD) is popular in various learning problems, there is much less theoretical analysis on the generalization behavior of models trained by RCD, especially under the pairwise learning framework. In this paper, we consider the generalization of RCD for pairwise learning. We measure the on-average argument stability for both convex and strongly convex objective functions, based on which we develop generalization bounds in expectation. The early-stopping strategy is adopted to quantify the balance between estimation and optimization. Our analysis further incorporates the low-noise setting into the excess risk bound to achieve the optimistic bound as , where is the sample size.

Paper Structure

This paper contains 20 sections, 10 theorems, 95 equations, 24 figures, 2 tables.

Key Result

Lemma 1

Let $S,S_i$ be constructed as Definition de2. Then we bound estimation errors with stability measures below. (a) Let Assumption as1 hold. Then the estimation error can be bounded by the $\ell_1$ on-average argument stability below (b) Let Assumption as2 hold. Then for any $\gamma >0$ we have the following estimation error bound with the $\ell_2$ on-average argument stability (c) Let $n$ denote t

Figures (24)

  • Figure 1: (a) hinge for a3a
  • Figure 2: (b) hinge for gisette
  • Figure 3: (c) hinge for madelon
  • Figure 4: (d) hinge for usps
  • Figure 5: (e) logistic for a3a
  • ...and 19 more figures

Theorems & Definitions (25)

  • Example 1
  • Example 2
  • Example 3
  • Definition 1
  • Definition 2
  • Lemma 1
  • Remark 1
  • Lemma 2
  • Theorem 3
  • Remark 2
  • ...and 15 more