Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

Jiehao Liang; Zhao Song; Zhaozhuo Xu; Junze Yin; Danyang Zhuo

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

Jiehao Liang, Zhao Song, Zhaozhuo Xu, Junze Yin, Danyang Zhuo

TL;DR

This work addresses dynamic kernel density estimation under evolving datasets and adversarial queries. It introduces an adaptive KDE data structure that co-designs importance sampling, geometric weight levels, and Locality-Sensitive Hashing to achieve robust, sublinear updates and queries with $(1\pm\epsilon)$ accuracy. The authors provide detailed space and time bounds, prove robustness via median amplification and net-cover arguments, and maintain correctness under adaptive adversaries using Lipschitz properties of KDE. The resulting framework enables efficient KDE in online settings with guarantees, potentially impacting streaming, online learning, and optimization where KDEs must adapt quickly to changing data. Overall, the paper advances theory and practical design for robust, dynamic KDE data structures with provable performance guarantees.

Abstract

Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function $f(x,y)$ and a set of points $\{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$, we would like to compute $\frac{1}{n}\sum_{i=1}^{n} f(x_i,y)$ for any query point $y \in \mathbb{R}^d$. Recently, there has been a growing trend of using data structures for efficient KDE. However, the proposed KDE data structures focus on static settings. The robustness of KDE data structures over dynamic changing data distributions is not addressed. In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries. Especially, we provide a theoretical framework of KDE data structures. In our framework, the KDE data structures only require subquadratic spaces. Moreover, our data structure supports the dynamic update of the dataset in sublinear time. Furthermore, we can perform adaptive queries with the potential adversary in sublinear time.

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

TL;DR

accuracy. The authors provide detailed space and time bounds, prove robustness via median amplification and net-cover arguments, and maintain correctness under adaptive adversaries using Lipschitz properties of KDE. The resulting framework enables efficient KDE in online settings with guarantees, potentially impacting streaming, online learning, and optimization where KDEs must adapt quickly to changing data. Overall, the paper advances theory and practical design for robust, dynamic KDE data structures with provable performance guarantees.

Abstract

Kernel density estimation (KDE) stands out as a challenging task in machine learning. The problem is defined in the following way: given a kernel function

and a set of points

, we would like to compute

for any query point

. Recently, there has been a growing trend of using data structures for efficient KDE. However, the proposed KDE data structures focus on static settings. The robustness of KDE data structures over dynamic changing data distributions is not addressed. In this work, we focus on the dynamic maintenance of KDE data structures with robustness to adversarial queries. Especially, we provide a theoretical framework of KDE data structures. In our framework, the KDE data structures only require subquadratic spaces. Moreover, our data structure supports the dynamic update of the dataset in sublinear time. Furthermore, we can perform adaptive queries with the potential adversary in sublinear time.

Paper Structure (48 sections, 43 theorems, 98 equations, 8 algorithms)

This paper contains 48 sections, 43 theorems, 98 equations, 8 algorithms.

Introduction
Notations.
Related Work
Efficient Kernel Density Estimation
Adaptive Data Structure
Problem Formulation
Our Result
Technical Overview
Roadmap
Preliminary
Technical Claims
Our Data Structures
LSH Data Structure
Initialize Part of Data Structure
Update Part of Data Structure
...and 33 more sections

Key Result

Theorem 1.2

Given a function $K$ and a set of points set $X \subset \mathbb{R}^d$. Let $\mathop{\mathrm{cost}}\nolimits(f)$ be defined as Definition def:cost_K. For any accuracy parameter $\epsilon \in (0,0.1)$, there is a data structure using space $O(\epsilon^{-2}n\cdot \mathop{\mathrm{cost}}\nolimits(f))$ (A

Theorems & Definitions (72)

Definition 1.1: Dynamic Kernel Density Estimation
Theorem 1.2: Main result
Definition 2.1: Geometric Weight Levels
Definition 2.2: Importance Sampling
Definition 2.3: Locally Sensitive Hash im98
Lemma 2.4: Lemma 3.2 in page 6 of ai06
Remark 2.5
Lemma 2.6: probability bound for separating points in different level sets, informal version of Lemma \ref{['lem:LSH_formal']}
Definition 2.7: Kernel cost
Lemma 3.1: Sizes of geometric weight levels
...and 62 more

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

TL;DR

Abstract

Dynamic Maintenance of Kernel Density Estimation Data Structure: From Practice to Theory

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (72)