Randomized Pairwise Learning with Adaptive Sampling: A PAC-Bayes Analysis
Sijia Zhou, Yunwen Lei, Ata Kabán
TL;DR
The paper addresses generalization in pairwise learning under data-adaptive sampling by developing a framework that blends PAC-Bayes with algorithmic stability to accommodate arbitrary, data-dependent sampling of index pairs. It treats the sampling scheme as a hyperparameter distribution, enabling uniform PAC-Bayes bounds without gradient-correction factors and leveraging a $U$-statistic representation of the pairwise loss. The main results instantiate this framework for pairwise SGD and pairwise SGDA, deriving sub-exponential stability-based generalization bounds that hold under smooth and non-smooth convex assumptions, with rates of $\widetilde{O}(1/\sqrt{n})$ and explicit guidance on iteration and stepsize choices. This provides principled guarantees for non-uniform sampling in large-scale pairwise learning tasks and supports adversarial training contexts, expanding the toolkit for analyzing randomized optimization in non-i.i.d. settings.
Abstract
We study stochastic optimization with data-adaptive sampling schemes to train pairwise learning models. Pairwise learning is ubiquitous, and it covers several popular learning tasks such as ranking, metric learning and AUC maximization. A notable difference of pairwise learning from pointwise learning is the statistical dependencies among input pairs, for which existing analyses have not been able to handle in the general setting considered in this paper. To this end, we extend recent results that blend together two algorithm-dependent frameworks of analysis -- algorithmic stability and PAC-Bayes -- which allow us to deal with any data-adaptive sampling scheme in the optimizer. We instantiate this framework to analyze (1) pairwise stochastic gradient descent, which is a default workhorse in many machine learning problems, and (2) pairwise stochastic gradient descent ascent, which is a method used in adversarial training. All of these algorithms make use of a stochastic sampling from a discrete distribution (sample indices) before each update. Non-uniform sampling of these indices has been already suggested in the recent literature, to which our work provides generalization guarantees in both smooth and non-smooth convex problems.
