Box Filtration

Enrique Alvarado; Prashant Gupta; Bala Krishnamoorthy

Box Filtration

Enrique Alvarado, Prashant Gupta, Bala Krishnamoorthy

TL;DR

Box Filtration introduces a novel framework that replaces ball-based growth with axis-aligned boxes to unify filtration and mapper constructions in topological data analysis. By employing point and pixel covers, it enables LP-driven, dimension-wise box growth with stability guarantees under the Gromov-Hausdorff distance, and provides two scalable algorithms for building the filtration. The approach yields persistence diagrams that demonstrate robustness to noise and reduced symmetry bias compared to Vietoris–Rips and distance-to-measure filtrations, with a practical Box Mapper variant that integrates into scalable pipelines. The accompanying software and theoretical results suggest strong potential for robust, interpretable topological summaries of large point clouds, across both single-filtration and mapper-style analyses.

Abstract

We define a new framework that unifies the filtration and mapper approaches from TDA, and present efficient algorithms to compute it. Termed the box filtration of a PCD, we grow boxes (hyperrectangles) that are not necessarily centered at each point (in place of balls centered at points). We grow the boxes non-uniformly and asymmetrically in different dimensions based on the distribution of points. We present two approaches to handle the boxes: a point cover where each point is assigned its own box at start, and a pixel cover that works with a pixelization of the space of the PCD. Any box cover in either setting automatically gives a mapper of the PCD. We show that the persistence diagrams generated by the box filtration using both point and pixel covers satisfy the classical stability based on the Gromov-Hausdorff distance. Using boxes also implies that the box filtration is identical for pairwise or higher order intersections whereas the VR and Cech filtration are not the same. Growth in each dimension is computed by solving a linear program (LP) that optimizes a cost functional balancing the cost of expansion and benefit of including more points in the box. The box filtration algorithm runs in $O(m|U(0)|\log(mnπ)L(q))$ time, where $m$ is number of steps of increments considered for box growth, $|U(0)|$ is the number of boxes in the initial cover ($\leq$ number of points), $π$ is the step length for increasing each box dimension, each LP is solved in $O(L(q))$ time, $n$ is the PCD dimension, and $q = n \times |X|$. We demonstrate through multiple examples that the box filtration can produce more accurate results to summarize the topology of the PCD than VR and distance-to-measure (DTM) filtrations. Software for our implementation is available at https://github.com/pragup/Box-Filteration.

Box Filtration

TL;DR

Abstract

time, where

is number of steps of increments considered for box growth,

is the number of boxes in the initial cover (

number of points),

is the step length for increasing each box dimension, each LP is solved in

time,

is the PCD dimension, and

. We demonstrate through multiple examples that the box filtration can produce more accurate results to summarize the topology of the PCD than VR and distance-to-measure (DTM) filtrations. Software for our implementation is available at https://github.com/pragup/Box-Filteration.

Paper Structure (25 sections, 34 theorems, 76 equations, 18 figures, 1 table)

This paper contains 25 sections, 34 theorems, 76 equations, 18 figures, 1 table.

Introduction
Our Contributions
Construction
Box Cover
Point Cover
Example for \ref{['thm:unionintersectionpiforx']}:
Example for \ref{['thm:unionintersectionlargepiforx']}
Pixel Cover
Relations of the Point and Pixel Cover Optimal Solutions
Box Filtration
Largest optimal expansion algorithm:
$\boldsymbol{k}$-optimal expansion algorithm:
Correctness of expansion algorithms:
Complexity of expansion algorithms
Stability
...and 10 more sections

Key Result

Proposition 2.4

Let $V^{l} \supseteq V,\, V^{k} \supseteq V$, and $\hat{V} = V^{l} \cup V^{k}$ be expansions of a box $V$ such that $V = V^k \cap V^l$ for some neighborhood $N$. Let $S(V, V^{l}), \, S(V, V^{k})$, and $S(V, \hat{V})$ be the sequences with $c^{l}_i, \, c^{k}_i$, and $\hat{c}_i$ being the correspondin

Figures (18)

Figure 1: Point cloud (Left) consisting of 100 points sampled on an ellipse along with 50 random points in and around it. Best persistence diagrams using the box filtration (second figure) and DTM (third figure) along with that using VR filtration (right/fourth figure) are shown. Box filtration identifies the ellipse clearly over several values of its parameter $\alpha$ (diagram for $\alpha=0.1$ is shown here), while results are less clear with DTM for most values of its parameter $m$ (best DTM diagram for $m=0.2$ is shown here). VR filtration fails to identify the ellipse.
Figure 2: Point cloud ($X$) is shown in red. Top row shows covers of $X$ for $\alpha = 0.5$ with $\pi = 5, 6, 11, 12$ from left to right. Second row shows their nerves. Similarly, middle two rows show covers for $\alpha = 0.6$ and their nerves, while bottom two rows show them for $\alpha = 0.7$, for the same set of $\pi$ values.
Figure 3: Example illustrating non-uniqueness of optimal solutions to the box expansion linear program.
Figure 4:
Figure 5: Illustration of the construction of $\Psi_1(V), \Psi_2(V)$, and $\Psi_3(V)$ in brown, orange, and blue, respectively, for a given $V$ in green. Consider the centroid of the bottom-left pixel shown in red. This centroid will be weighted with respect to the left side of the box $V$ as shown in \ref{['fig:xsifunctiondeffig2']}. It will still be weighted with respect to the left side of each of $\Psi_{1}(V), \Psi_{2}(V)$, and $\Psi_{3}(V)$.
...and 13 more figures

Theorems & Definitions (74)

Definition 2.1
Example 2.2
Definition 2.3
Proposition 2.4
proof
Theorem 2.5
proof
Theorem 2.6
proof
Lemma 2.7
...and 64 more

Box Filtration

TL;DR

Abstract

Box Filtration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (74)