Robust density estimation over star-shaped density classes
Xiaolong Liu, Matey Neykov
TL;DR
The paper addresses robust density estimation within star-shaped, bounded density classes under adversarial data contamination. It introduces a corruption-aware group likelihood criterion and a multistage sieve estimator that builds a pruned packing-tree to select candidate densities. By leveraging local metric entropy, it derives minimax lower and upper bounds that yield the rate $\max\{ {\tau^*}^2 \wedge d^2, \epsilon \wedge d^2\}$ under a key condition linking $L_2$ and TV, with matching bounds when the condition holds. The work extends to classes $\mathcal{F}_B^{[0,\beta]}$ and discusses implications for KL/Hellinger losses and potential extensions to robust regression.
Abstract
We establish a novel criterion for comparing the performance of two densities, $g_1$ and $g_2$, within the context of corrupted data. Utilizing this criterion, we propose an algorithm to construct a density estimator within a star-shaped density class, $\mathcal{F}$, under conditions of data corruption. We proceed to derive the minimax upper and lower bounds for density estimation across this star-shaped density class, characterized by densities that are uniformly bounded above and below (in the sup norm), in the presence of adversarially corrupted data. Specifically, we assume that a fraction $ε\leq \frac{1}{3}$ of the $N$ observations are arbitrarily corrupted. We obtain the minimax upper bound $\max\{ τ_{\overline{J}}^2, ε\} \wedge d^2$. Under certain conditions, we obtain the minimax risk, up to proportionality constants, under the squared $L_2$ loss as $$ \max\left\{ τ^{*2} \wedge d^2, ε\wedge d^2 \right\}, $$ where $τ^* := \sup\left\{ τ: Nτ^2 \leq \log \mathcal{M}_{\mathcal{F}}^{\text{loc}}(τ, c) \right\}$ for a sufficiently large constant $c$. Here, $\mathcal{M}_{\mathcal{F}}^{\text{loc}}(τ, c)$ denotes the local entropy of the set $\mathcal{F}$, and $d$ is the $L_2$ diameter of $\mathcal{F}$.
