Table of Contents
Fetching ...

Banana Trees for the Persistence in Time Series Experimentally

Lara Ost, Sebastiano Cultrera di Montesano, Herbert Edelsbrunner

TL;DR

The paper addresses the challenge of maintaining persistent homology for dynamically evolving time series by introducing Banana Trees, a dual-tree data structure that represents min–max relations and their nested windows. Updates incur per-change time $O(\log n + k)$, enabling substantial speedups over static recomputation with state-of-the-art tools, especially on large sequences generated by unbiased random walks. Experimental results show large median speedups (up to hundreds or more) for long time series, with structural properties like $\text{nesting depth} \sim O(\log n)$ that support efficient maintenance; worst-case inputs reveal linear-time behavior, while quasi-periodic and real-world data behave similarly to unbiased random walks in practice. These findings suggest Banana Trees can enable real-time topological analysis in applications such as healthcare and finance, while also outlining future work in period estimation and broader deployment.

Abstract

In numerous fields, dynamic time series data require continuous updates, necessitating efficient data processing techniques for accurate analysis. This paper examines the banana tree data structure, specifically designed to efficiently maintain persistent homology -- a multi-scale topological descriptor -- for dynamically changing time series data. We implement this data structure and conduct an experimental study to assess its properties and runtime for update operations. Our findings indicate that banana trees are highly effective with unbiased random data, outperforming state-of-the-art static algorithms in these scenarios. Additionally, our results show that real-world time series share structural properties with unbiased random walks, suggesting potential practical utility for our implementation.

Banana Trees for the Persistence in Time Series Experimentally

TL;DR

The paper addresses the challenge of maintaining persistent homology for dynamically evolving time series by introducing Banana Trees, a dual-tree data structure that represents min–max relations and their nested windows. Updates incur per-change time , enabling substantial speedups over static recomputation with state-of-the-art tools, especially on large sequences generated by unbiased random walks. Experimental results show large median speedups (up to hundreds or more) for long time series, with structural properties like that support efficient maintenance; worst-case inputs reveal linear-time behavior, while quasi-periodic and real-world data behave similarly to unbiased random walks in practice. These findings suggest Banana Trees can enable real-time topological analysis in applications such as healthcare and finance, while also outlining future work in period estimation and broader deployment.

Abstract

In numerous fields, dynamic time series data require continuous updates, necessitating efficient data processing techniques for accurate analysis. This paper examines the banana tree data structure, specifically designed to efficiently maintain persistent homology -- a multi-scale topological descriptor -- for dynamically changing time series data. We implement this data structure and conduct an experimental study to assess its properties and runtime for update operations. Our findings indicate that banana trees are highly effective with unbiased random data, outperforming state-of-the-art static algorithms in these scenarios. Additionally, our results show that real-world time series share structural properties with unbiased random walks, suggesting potential practical utility for our implementation.
Paper Structure (22 sections, 6 equations, 15 figures, 3 tables)

This paper contains 22 sections, 6 equations, 15 figures, 3 tables.

Figures (15)

  • Figure 1: Middle: a piece of the graph of $f$ with four double-panel windows of which three are nested inside the fourth. Left: the corresponding points in the persistence diagram and the arrows that reflect the nesting relation among the windows. Right: the four corresponding bananas in the up-tree of $f$.
  • Figure 2: Left: the average fraction of items that are critical as a function of the bias $\mu$. Middle: the average nesting depth of leaf bananas in banana trees of unbiased random walks with $n$ items. The ribbon extends from the minimum to the maximum observed value for each $n$; the dots mark the mean. The red line is the graph of a constant times $\log n$ obtained by linear regression. Right: the maximum nesting depth at $n=10^6$ items as a function of the bias $\mu$.
  • Figure 3: Left: the average length of in-trails (orange dots) and mid-trails (blue triangles) in banana trees of random walks with bias $\mu$, averaged over all input sizes. Right: the fraction of nodes on the longest in-trail (orange dots) and longest mid-trails (blue triangles) in banana trees of random walks with bias $\mu$, averaged over all input sized.
  • Figure 4: Comparing the maintenance of banana trees with reconstructing the persistence diagram with Gudhi, in which the baseline of no speedup ($10^0 = 1$) is marked with a horizontal gray line. Left: the speedup for updating a value with $\delta = \pm 5.0$ depending on the length of the random walk, $n$. The ribbon spans the minimum and maximum observed speedup, with the black curve tracing the median speedup. Middle: the speedup for using banana trees to cut a random walk with bias $\mu$ in half. The type and color of the curve encodes the amount of bias, and each curve shows the median speedup over a hundred repeats for each $n$. Right: the speedup for using banana trees to concatenate two equally long random walks with bias $\mu$.
  • Figure 5: Worst-case examples for local and topological maintenance. Left: to increase the value of the marked item triggers a linear number of interchanges. Right: to cut the list at the dashed line affects every persistent pair. Both operations take time linear in the number of critical items, which for the two time series are all or almost all items.
  • ...and 10 more figures