Table of Contents
Fetching ...

Gentle Local Robustness implies Generalization

Khoat Than, Dat Phan, Giang Vu

TL;DR

This work shows that classical robustness-based generalization bounds can be vacuous for overlapping-class problems, including the Bayes-optimal classifier. It introduces model-dependent, locality-sensitive bounds that quantify generalization through per-region robustness and region-specific losses, and proves these bounds are tighter and converge to the true Bayes error as data accumulate. The authors provide both upper and lower bounds, plus practical computation strategies, and demonstrate non-vacuous, highly correlated estimates on ImageNet-pretrained nets and PCA tasks. The findings offer a practical framework for model selection and evaluation that better reflects real-world generalization performance and adversarial considerations.

Abstract

Robustness and generalization ability of machine learning models are of utmost importance in various application domains. There is a wide interest in efficient ways to analyze those properties. One important direction is to analyze connection between those two properties. Prior theories suggest that a robust learning algorithm can produce trained models with a high generalization ability. However, we show in this work that the existing error bounds are vacuous for the Bayes optimal classifier which is the best among all measurable classifiers for a classification problem with overlapping classes. Those bounds cannot converge to the true error of this ideal classifier. This is undesirable, surprizing, and never known before. We then present a class of novel bounds, which are model-dependent and provably tighter than the existing robustness-based ones. Unlike prior ones, our bounds are guaranteed to converge to the true error of the best classifier, as the number of samples increases. We further provide an extensive experiment and find that two of our bounds are often non-vacuous for a large class of deep neural networks, pretrained from ImageNet.

Gentle Local Robustness implies Generalization

TL;DR

This work shows that classical robustness-based generalization bounds can be vacuous for overlapping-class problems, including the Bayes-optimal classifier. It introduces model-dependent, locality-sensitive bounds that quantify generalization through per-region robustness and region-specific losses, and proves these bounds are tighter and converge to the true Bayes error as data accumulate. The authors provide both upper and lower bounds, plus practical computation strategies, and demonstrate non-vacuous, highly correlated estimates on ImageNet-pretrained nets and PCA tasks. The findings offer a practical framework for model selection and evaluation that better reflects real-world generalization performance and adversarial considerations.

Abstract

Robustness and generalization ability of machine learning models are of utmost importance in various application domains. There is a wide interest in efficient ways to analyze those properties. One important direction is to analyze connection between those two properties. Prior theories suggest that a robust learning algorithm can produce trained models with a high generalization ability. However, we show in this work that the existing error bounds are vacuous for the Bayes optimal classifier which is the best among all measurable classifiers for a classification problem with overlapping classes. Those bounds cannot converge to the true error of this ideal classifier. This is undesirable, surprizing, and never known before. We then present a class of novel bounds, which are model-dependent and provably tighter than the existing robustness-based ones. Unlike prior ones, our bounds are guaranteed to converge to the true error of the best classifier, as the number of samples increases. We further provide an extensive experiment and find that two of our bounds are often non-vacuous for a large class of deep neural networks, pretrained from ImageNet.

Paper Structure

This paper contains 28 sections, 12 theorems, 39 equations, 7 tables.

Key Result

Theorem 1

Given Assumption assumption-Alg-robust, consider ${\bm{h}}$ learned by algorithm ${\mathcal{A}}$ from a dataset ${\bm{S}}$ which consists of $n$ i.i.d. samples from distribution $P$, and a bounded loss $\ell$. For any $\delta >0$, denote $C_{{\mathcal{H}}} = \sup_{{\bm{f}} \in {\mathcal{H}}, {\bm{z}

Theorems & Definitions (26)

  • Definition 1
  • Theorem 1: xu2012robustnessGeneralize
  • Theorem 2: kawaguchi2022robustness
  • Theorem 3: Bayes optimal classifier
  • Theorem 4: Local Robustness
  • Theorem 5
  • Theorem 6
  • Lemma 3.1
  • Lemma 3.2
  • Remark 1
  • ...and 16 more