Table of Contents
Fetching ...

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

Jiehao Liang, Zhao Song, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

TL;DR

This work addresses dynamic kernel density estimation under evolving datasets and adversarial queries. It introduces an adaptive KDE data structure that co-designs importance sampling, geometric weight levels, and Locality-Sensitive Hashing to achieve robust, sublinear updates and queries with $(1\pm\epsilon)$ accuracy. The authors provide detailed space and time bounds, prove robustness via median amplification and net-cover arguments, and maintain correctness under adaptive adversaries using Lipschitz properties of KDE. The resulting framework enables efficient KDE in online settings with guarantees, potentially impacting streaming, online learning, and optimization where KDEs must adapt quickly to changing data. Overall, the paper advances theory and practical design for robust, dynamic KDE data structures with provable performance guarantees.

Abstract

Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function $f(x,y)$ and a set of points $\{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$, we would like to compute $\frac{1}{n}\sum_{i=1}^{n} f(x_i,y)$ for any query point $y \in \mathbb{R}^d$. Recently, there has been a growing trend of using data structures for efficient KDE. However, the proposed KDE data structures focus on static settings. The robustness of KDE data structures over dynamic changing data distributions is not addressed. In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries. Especially, we provide a theoretical framework of KDE data structures. In our framework, the KDE data structures only require subquadratic spaces. Moreover, our data structure supports the dynamic update of the dataset in sublinear time. Furthermore, we can perform adaptive queries with the potential adversary in sublinear time.

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

TL;DR

This work addresses dynamic kernel density estimation under evolving datasets and adversarial queries. It introduces an adaptive KDE data structure that co-designs importance sampling, geometric weight levels, and Locality-Sensitive Hashing to achieve robust, sublinear updates and queries with accuracy. The authors provide detailed space and time bounds, prove robustness via median amplification and net-cover arguments, and maintain correctness under adaptive adversaries using Lipschitz properties of KDE. The resulting framework enables efficient KDE in online settings with guarantees, potentially impacting streaming, online learning, and optimization where KDEs must adapt quickly to changing data. Overall, the paper advances theory and practical design for robust, dynamic KDE data structures with provable performance guarantees.

Abstract

Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function and a set of points , we would like to compute for any query point . Recently, there has been a growing trend of using data structures for efficient KDE. However, the proposed KDE data structures focus on static settings. The robustness of KDE data structures over dynamic changing data distributions is not addressed. In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries. Especially, we provide a theoretical framework of KDE data structures. In our framework, the KDE data structures only require subquadratic spaces. Moreover, our data structure supports the dynamic update of the dataset in sublinear time. Furthermore, we can perform adaptive queries with the potential adversary in sublinear time.
Paper Structure (48 sections, 43 theorems, 98 equations, 8 algorithms)

This paper contains 48 sections, 43 theorems, 98 equations, 8 algorithms.

Key Result

Theorem 1.2

Given a function $K$ and a set of points set $X \subset \mathbb{R}^d$. Let $\mathop{\mathrm{cost}}\nolimits(f)$ be defined as Definition def:cost_K. For any accuracy parameter $\epsilon \in (0,0.1)$, there is a data structure using space $O(\epsilon^{-2}n\cdot \mathop{\mathrm{cost}}\nolimits(f))$ (A

Theorems & Definitions (72)

  • Definition 1.1: Dynamic Kernel Density Estimation
  • Theorem 1.2: Main result
  • Definition 2.1: Geometric Weight Levels
  • Definition 2.2: Importance Sampling
  • Definition 2.3: Locally Sensitive Hash im98
  • Lemma 2.4: Lemma 3.2 in page 6 of ai06
  • Remark 2.5
  • Lemma 2.6: probability bound for separating points in different level sets, informal version of Lemma \ref{['lem:LSH_formal']}
  • Definition 2.7: Kernel cost
  • Lemma 3.1: Sizes of geometric weight levels
  • ...and 62 more