Table of Contents
Fetching ...

Randomized Pairwise Learning with Adaptive Sampling: A PAC-Bayes Analysis

Sijia Zhou, Yunwen Lei, Ata Kabán

TL;DR

The paper addresses generalization in pairwise learning under data-adaptive sampling by developing a framework that blends PAC-Bayes with algorithmic stability to accommodate arbitrary, data-dependent sampling of index pairs. It treats the sampling scheme as a hyperparameter distribution, enabling uniform PAC-Bayes bounds without gradient-correction factors and leveraging a $U$-statistic representation of the pairwise loss. The main results instantiate this framework for pairwise SGD and pairwise SGDA, deriving sub-exponential stability-based generalization bounds that hold under smooth and non-smooth convex assumptions, with rates of $\widetilde{O}(1/\sqrt{n})$ and explicit guidance on iteration and stepsize choices. This provides principled guarantees for non-uniform sampling in large-scale pairwise learning tasks and supports adversarial training contexts, expanding the toolkit for analyzing randomized optimization in non-i.i.d. settings.

Abstract

We study stochastic optimization with data-adaptive sampling schemes to train pairwise learning models. Pairwise learning is ubiquitous, and it covers several popular learning tasks such as ranking, metric learning and AUC maximization. A notable difference of pairwise learning from pointwise learning is the statistical dependencies among input pairs, for which existing analyses have not been able to handle in the general setting considered in this paper. To this end, we extend recent results that blend together two algorithm-dependent frameworks of analysis -- algorithmic stability and PAC-Bayes -- which allow us to deal with any data-adaptive sampling scheme in the optimizer. We instantiate this framework to analyze (1) pairwise stochastic gradient descent, which is a default workhorse in many machine learning problems, and (2) pairwise stochastic gradient descent ascent, which is a method used in adversarial training. All of these algorithms make use of a stochastic sampling from a discrete distribution (sample indices) before each update. Non-uniform sampling of these indices has been already suggested in the recent literature, to which our work provides generalization guarantees in both smooth and non-smooth convex problems.

Randomized Pairwise Learning with Adaptive Sampling: A PAC-Bayes Analysis

TL;DR

The paper addresses generalization in pairwise learning under data-adaptive sampling by developing a framework that blends PAC-Bayes with algorithmic stability to accommodate arbitrary, data-dependent sampling of index pairs. It treats the sampling scheme as a hyperparameter distribution, enabling uniform PAC-Bayes bounds without gradient-correction factors and leveraging a -statistic representation of the pairwise loss. The main results instantiate this framework for pairwise SGD and pairwise SGDA, deriving sub-exponential stability-based generalization bounds that hold under smooth and non-smooth convex assumptions, with rates of and explicit guidance on iteration and stepsize choices. This provides principled guarantees for non-uniform sampling in large-scale pairwise learning tasks and supports adversarial training contexts, expanding the toolkit for analyzing randomized optimization in non-i.i.d. settings.

Abstract

We study stochastic optimization with data-adaptive sampling schemes to train pairwise learning models. Pairwise learning is ubiquitous, and it covers several popular learning tasks such as ranking, metric learning and AUC maximization. A notable difference of pairwise learning from pointwise learning is the statistical dependencies among input pairs, for which existing analyses have not been able to handle in the general setting considered in this paper. To this end, we extend recent results that blend together two algorithm-dependent frameworks of analysis -- algorithmic stability and PAC-Bayes -- which allow us to deal with any data-adaptive sampling scheme in the optimizer. We instantiate this framework to analyze (1) pairwise stochastic gradient descent, which is a default workhorse in many machine learning problems, and (2) pairwise stochastic gradient descent ascent, which is a method used in adversarial training. All of these algorithms make use of a stochastic sampling from a discrete distribution (sample indices) before each update. Non-uniform sampling of these indices has been already suggested in the recent literature, to which our work provides generalization guarantees in both smooth and non-smooth convex problems.

Paper Structure

This paper contains 9 sections, 10 theorems, 79 equations, 1 table.

Key Result

Lemma 4.1

Given distribution $\mathrm{P}$, $c_1, c_2>0$, and $M$-bounded loss for a sub-exponentially stable algorithm $A$, $\forall \delta\in(0,1/n)$, with probability at least $1-\delta$, the following holds uniformly for all $\mathrm{Q}$ absolutely continuous w.r.t. $\mathrm{P}$, where $\rm{KL}(\mathrm{Q}\|\mathrm{P})$ is the KL divergence between $\mathrm{P}$ and $\mathrm{Q}$

Theorems & Definitions (19)

  • Definition 3.1: Uniform Stability
  • Lemma 4.1: Generalization of randomized pairwise learning
  • Lemma 4.2: Sub-exponential stability of pairwise SGD
  • Theorem 4.3: Generalization bounds for pairwise SGD
  • Remark 4.4
  • Lemma 4.5: Sub-exponential stability of pairwise SGDA
  • Theorem 4.6: Generalization bounds for pairwise SGDA
  • Lemma 4.7
  • Lemma 4.8: Lemma 4.10 in van2014probability
  • Lemma 4.9: Theorem 1 in lei2020sharper
  • ...and 9 more