Table of Contents
Fetching ...

Robust Online Classification: From Estimation to Denoising

Changlong Wu, Ananth Grama, Wojciech Szpankowski

TL;DR

This work advances online learning with noisy labels under adversarial feature streams by formulating a general noise-kernel model and a minimax risk objective. It establishes that the asymptotically tight risk is governed by the Hellinger gap of the induced noisy-label distributions, enabling a reduction from multiclass prediction to pairwise hypothesis testing via a novel conditional Le Cam-Birgé testing framework and an online conditional distribution estimation approach with Exponential Weighted Averaging. The paper provides binary and multiclass results, extends to soft gaps and unknown-gap settings, and develops robust bounds under randomized-response and single-distribution kernels, with explicit results for infinite classes and stochastically generated features. These theoretical guarantees hold across a wide range of noise mechanisms, including local-differential privacy-inspired kernels, and offer a principled basis for designing online denoising and robust classification systems in the presence of label noise and adversarial inputs.

Abstract

We study online classification of features into labels with general hypothesis classes. In our setting, true labels are determined by some function within the hypothesis class but are corrupted by unknown stochastic noise, and the features are generated adversarially. Predictions are made using observed noisy labels and noiseless features, while the performance is measured via minimax risk when comparing against true labels. The noise mechanism is modeled via a general noise kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is tightly characterized (up to a logarithmic factor of the hypothesis class size) by the Hellinger gap of the noisy label distributions induced by the kernel, independent of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new conditional version of Le Cam-Birgé testing suitable for online settings. Our work provides the first comprehensive characterization for noisy online classification with guarantees with respect to the ground truth while addressing general noisy observations.

Robust Online Classification: From Estimation to Denoising

TL;DR

This work advances online learning with noisy labels under adversarial feature streams by formulating a general noise-kernel model and a minimax risk objective. It establishes that the asymptotically tight risk is governed by the Hellinger gap of the induced noisy-label distributions, enabling a reduction from multiclass prediction to pairwise hypothesis testing via a novel conditional Le Cam-Birgé testing framework and an online conditional distribution estimation approach with Exponential Weighted Averaging. The paper provides binary and multiclass results, extends to soft gaps and unknown-gap settings, and develops robust bounds under randomized-response and single-distribution kernels, with explicit results for infinite classes and stochastically generated features. These theoretical guarantees hold across a wide range of noise mechanisms, including local-differential privacy-inspired kernels, and offer a principled basis for designing online denoising and robust classification systems in the presence of label noise and adversarial inputs.

Abstract

We study online classification of features into labels with general hypothesis classes. In our setting, true labels are determined by some function within the hypothesis class but are corrupted by unknown stochastic noise, and the features are generated adversarially. Predictions are made using observed noisy labels and noiseless features, while the performance is measured via minimax risk when comparing against true labels. The noise mechanism is modeled via a general noise kernel that specifies, for any individual data point, a set of distributions from which the actual noisy label distribution is chosen. We show that minimax risk is tightly characterized (up to a logarithmic factor of the hypothesis class size) by the Hellinger gap of the noisy label distributions induced by the kernel, independent of other properties such as the means and variances of the noise. Our main technique is based on a novel reduction to an online comparison scheme of two hypotheses, along with a new conditional version of Le Cam-Birgé testing suitable for online settings. Our work provides the first comprehensive characterization for noisy online classification with guarantees with respect to the ground truth while addressing general noisy observations.
Paper Structure (27 sections, 33 theorems, 120 equations, 2 algorithms)

This paper contains 27 sections, 33 theorems, 120 equations, 2 algorithms.

Key Result

Theorem 1

Let $\mathcal{H}\subset \mathcal{Y}^{\mathcal{X}}$ be a finite class with $|\mathcal{Y}|=2$, $\mathcal{K}$ be any noisy kernel that satisfies $\forall \mathbf{x}\in \mathcal{X}$, $\forall y,y'\in \mathcal{Y}$ with $y\not=y'$, and $\mathcal{Q}_y^{\mathbf{x}}=\mathcal{K}(\mathbf{x},y)\subset \mathcal{D}(\tilde{\mathcal{Y}})$ is closed and convex. Then $\tilde{r}_T(\mathcal{H},\mathcal{K})\le \frac{

Theorems & Definitions (50)

  • Example 1
  • Theorem 1: Informal
  • Theorem 2: Informal
  • Theorem 3: Informal
  • Theorem 4: Informal
  • Definition 5
  • Definition 6
  • Definition 7
  • Example 2
  • Proposition 8
  • ...and 40 more