Table of Contents
Fetching ...

Distribution-Specific Agnostic Conditional Classification With Halfspaces

Jizhou Huang, Brendan Juba

TL;DR

It is proved that approximating conditional classification is at least as hard as approximating agnostic classification in both additive and multiplicative form.

Abstract

We study ``selective'' or ``conditional'' classification problems under an agnostic setting. Classification tasks commonly focus on modeling the relationship between features and categories that captures the vast majority of data. In contrast to common machine learning frameworks, conditional classification intends to model such relationships only on a subset of the data defined by some selection rule. Most work on conditional classification either solves the problem in a realizable setting or does not guarantee the error is bounded compared to an optimal solution. In this work, we consider selective/conditional classification by sparse linear classifiers for subsets defined by halfspaces, and give both positive as well as negative results for Gaussian feature distributions. On the positive side, we present the first PAC-learning algorithm for homogeneous halfspace selectors with error guarantee $\bigO*{\sqrt{\mathrm{opt}}}$, where $\mathrm{opt}$ is the smallest conditional classification error over the given class of classifiers and homogeneous halfspaces. On the negative side, we find that, under cryptographic assumptions, approximating the conditional classification loss within a small additive error is computationally hard even under Gaussian distribution. We prove that approximating conditional classification is at least as hard as approximating agnostic classification in both additive and multiplicative form.

Distribution-Specific Agnostic Conditional Classification With Halfspaces

TL;DR

It is proved that approximating conditional classification is at least as hard as approximating agnostic classification in both additive and multiplicative form.

Abstract

We study ``selective'' or ``conditional'' classification problems under an agnostic setting. Classification tasks commonly focus on modeling the relationship between features and categories that captures the vast majority of data. In contrast to common machine learning frameworks, conditional classification intends to model such relationships only on a subset of the data defined by some selection rule. Most work on conditional classification either solves the problem in a realizable setting or does not guarantee the error is bounded compared to an optimal solution. In this work, we consider selective/conditional classification by sparse linear classifiers for subsets defined by halfspaces, and give both positive as well as negative results for Gaussian feature distributions. On the positive side, we present the first PAC-learning algorithm for homogeneous halfspace selectors with error guarantee , where is the smallest conditional classification error over the given class of classifiers and homogeneous halfspaces. On the negative side, we find that, under cryptographic assumptions, approximating the conditional classification loss within a small additive error is computationally hard even under Gaussian distribution. We prove that approximating conditional classification is at least as hard as approximating agnostic classification in both additive and multiplicative form.

Paper Structure

This paper contains 18 sections, 28 theorems, 64 equations, 5 figures, 4 algorithms.

Key Result

theorem 1

There is an algorithm for robust list-learning of linear classifiers with $s=O(1)$ nonzero coefficients from $m=O(\frac{1}{\alpha\epsilon}(s\log d+\log\frac{1}{\delta}))$ examples in polynomial time with list size $O((md)^s)$.

Figures (5)

  • Figure 1: Blue area represents $\hypothesis[\bvar{v}]\isect\hypothesis[\bvar{w}]$, orange area represents $\hypothesis[\bvar{w}]\backslash\hypothesis[\bvar{v}]$.
  • Figure 2: Boundedness of $\loss<\distr>{\bvar{w}(i)}$ and almost Lipschitz continuity of $\derivative<\bvar{w}>\loss<\distr>{\bvar{w}}.$
  • Figure 3: Spherical coordinate interpretation.
  • Figure 4: Weight update step (line \ref{['line:psgd-gradient-update']}) and projection step (line \ref{['line:psgd-projection-step']}) in algorithm \ref{['algo:projected-stochastic-gradient-descent-for-minimizing-convex-surrogate-loss']}.
  • Figure 5: Blue area represent $\hypothesis[\bvar{v}]\isect\hypothesis[\bvar{w}]$, while orange area represents $\hypothesis[\bvar{w}]\isect\hypothesis*[\bvar{v}]$.

Theorems & Definitions (47)

  • definition 1: Agnostic Conditional Classification
  • definition 2: Robust list learning
  • remark 1
  • theorem 1
  • theorem 2: Main Theorem
  • proposition 1
  • proposition 2
  • lemma 1
  • theorem 3
  • definition 3: Learning With Errors
  • ...and 37 more