Fast Construction of Partitioned Learned Bloom Filter with Theoretical Guarantees

Atsuki Sato; Yusuke Matsui

Fast Construction of Partitioned Learned Bloom Filter with Theoretical Guarantees

Atsuki Sato, Yusuke Matsui

TL;DR

Three methods are proposed: fast PLBF, fast PLBF++, and fast PLBF#, that reduce the construction complexity to $O(N^2k)$, $O(Nk \log N)$, and $O(Nk \log k)$, respectively, and theoretically prove they are equivalent to PLBF under ideal data distribution.

Abstract

Bloom filter is a widely used classic data structure for approximate membership queries. Learned Bloom filters improve memory efficiency by leveraging machine learning, with the partitioned learned Bloom filter (PLBF) being among the most memory-efficient variants. However, PLBF suffers from high computational complexity during construction, specifically $O(N^3k)$, where $N$ and $k$ are hyperparameters. In this paper, we propose three methods: fast PLBF, fast PLBF++, and fast PLBF#, that reduce the construction complexity to $O(N^2k)$, $O(Nk \log N)$, and $O(Nk \log k)$, respectively. Fast PLBF preserves the original PLBF structure and memory efficiency. Although fast PLBF++ and fast PLBF# may have different structures, we theoretically prove they are equivalent to PLBF under ideal data distribution. Furthermore, we theoretically bound the difference in memory efficiency between PLBF and fast PLBF++ for non-ideal scenarios. Experiments on real-world datasets demonstrate that fast PLBF, fast PLBF++, and fast PLBF# are up to 233, 761, and 778 times faster to construct than original PLBF, respectively. Additionally, fast PLBF maintains the same data structure as PLBF, and fast PLBF++ and fast PLBF# achieve nearly identical memory efficiency.

Fast Construction of Partitioned Learned Bloom Filter with Theoretical Guarantees

TL;DR

Three methods are proposed: fast PLBF, fast PLBF++, and fast PLBF#, that reduce the construction complexity to

, and

, respectively, and theoretically prove they are equivalent to PLBF under ideal data distribution.

Abstract

, where

and

are hyperparameters. In this paper, we propose three methods: fast PLBF, fast PLBF++, and fast PLBF#, that reduce the construction complexity to

, and

, respectively. Fast PLBF preserves the original PLBF structure and memory efficiency. Although fast PLBF++ and fast PLBF# may have different structures, we theoretically prove they are equivalent to PLBF under ideal data distribution. Furthermore, we theoretically bound the difference in memory efficiency between PLBF and fast PLBF++ for non-ideal scenarios. Experiments on real-world datasets demonstrate that fast PLBF, fast PLBF++, and fast PLBF# are up to 233, 761, and 778 times faster to construct than original PLBF, respectively. Additionally, fast PLBF maintains the same data structure as PLBF, and fast PLBF++ and fast PLBF# achieve nearly identical memory efficiency.

Fast Construction of Partitioned Learned Bloom Filter with Theoretical Guarantees

TL;DR

Abstract

Fast Construction of Partitioned Learned Bloom Filter with Theoretical Guarantees

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (8)