Table of Contents
Fetching ...

Robust density estimation over star-shaped density classes

Xiaolong Liu, Matey Neykov

TL;DR

The paper addresses robust density estimation within star-shaped, bounded density classes under adversarial data contamination. It introduces a corruption-aware group likelihood criterion and a multistage sieve estimator that builds a pruned packing-tree to select candidate densities. By leveraging local metric entropy, it derives minimax lower and upper bounds that yield the rate $\max\{ {\tau^*}^2 \wedge d^2, \epsilon \wedge d^2\}$ under a key condition linking $L_2$ and TV, with matching bounds when the condition holds. The work extends to classes $\mathcal{F}_B^{[0,\beta]}$ and discusses implications for KL/Hellinger losses and potential extensions to robust regression.

Abstract

We establish a novel criterion for comparing the performance of two densities, $g_1$ and $g_2$, within the context of corrupted data. Utilizing this criterion, we propose an algorithm to construct a density estimator within a star-shaped density class, $\mathcal{F}$, under conditions of data corruption. We proceed to derive the minimax upper and lower bounds for density estimation across this star-shaped density class, characterized by densities that are uniformly bounded above and below (in the sup norm), in the presence of adversarially corrupted data. Specifically, we assume that a fraction $ε\leq \frac{1}{3}$ of the $N$ observations are arbitrarily corrupted. We obtain the minimax upper bound $\max\{ τ_{\overline{J}}^2, ε\} \wedge d^2$. Under certain conditions, we obtain the minimax risk, up to proportionality constants, under the squared $L_2$ loss as $$ \max\left\{ τ^{*2} \wedge d^2, ε\wedge d^2 \right\}, $$ where $τ^* := \sup\left\{ τ: Nτ^2 \leq \log \mathcal{M}_{\mathcal{F}}^{\text{loc}}(τ, c) \right\}$ for a sufficiently large constant $c$. Here, $\mathcal{M}_{\mathcal{F}}^{\text{loc}}(τ, c)$ denotes the local entropy of the set $\mathcal{F}$, and $d$ is the $L_2$ diameter of $\mathcal{F}$.

Robust density estimation over star-shaped density classes

TL;DR

The paper addresses robust density estimation within star-shaped, bounded density classes under adversarial data contamination. It introduces a corruption-aware group likelihood criterion and a multistage sieve estimator that builds a pruned packing-tree to select candidate densities. By leveraging local metric entropy, it derives minimax lower and upper bounds that yield the rate under a key condition linking and TV, with matching bounds when the condition holds. The work extends to classes and discusses implications for KL/Hellinger losses and potential extensions to robust regression.

Abstract

We establish a novel criterion for comparing the performance of two densities, and , within the context of corrupted data. Utilizing this criterion, we propose an algorithm to construct a density estimator within a star-shaped density class, , under conditions of data corruption. We proceed to derive the minimax upper and lower bounds for density estimation across this star-shaped density class, characterized by densities that are uniformly bounded above and below (in the sup norm), in the presence of adversarially corrupted data. Specifically, we assume that a fraction of the observations are arbitrarily corrupted. We obtain the minimax upper bound . Under certain conditions, we obtain the minimax risk, up to proportionality constants, under the squared loss as where for a sufficiently large constant . Here, denotes the local entropy of the set , and is the diameter of .
Paper Structure (12 sections, 22 theorems, 108 equations, 1 algorithm)

This paper contains 12 sections, 22 theorems, 108 equations, 1 algorithm.

Key Result

Lemma 2.2

For each pair of densities $f,g \in {\mathcal{F}}_B^{[\alpha,\beta]}$, the following relationship holds: where we denote $c(\alpha, \beta) := \frac{h(\beta/\alpha)}{\beta} > 0$. Here $h : (0, \infty) \rightarrow \mathbb{R}$ is defined to be and is positive over its entire support. It is also easily seen that on ${\mathcal{F}}_B^{[\alpha,\beta]}$, $d_{KL}$ (and hence the $L_2$-metric) is also equ

Theorems & Definitions (37)

  • Remark 1.2
  • Remark 1.3
  • Definition 1.4: Ambient density class ${\mathcal{F}}_B^{[\alpha,\beta]}$
  • Remark 1.5
  • Remark 1.6
  • Remark 1.7: Extending results to $\mathcal{F}_B^{[0,\beta]}$
  • Definition 2.1: KL-divergence
  • Lemma 2.2: KL-$L_2$ equivalence on ${\mathcal{F}}_B^{[\alpha,\beta]}$
  • Lemma 2.3: Fano's inequality for ${\mathcal{F}}$
  • Definition 2.4: Packing sets and packing numbers of ${\mathcal{F}}$ in the $L_2$-metric
  • ...and 27 more