Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)

Daichi Amagata; Junya Yamada; Yuchen Ji; Takahiro Hara

Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)

Daichi Amagata, Junya Yamada, Yuchen Ji, Takahiro Hara

TL;DR

This paper tackles the problem of processing top-k weighted stabbing queries on static weighted intervals. It introduces two exact algorithms: Interval Forest, which achieves $O(\sqrt{n}\log n + k)$ query time with $O(n)$ space, and a Segment Tree variant ST-PSA, which achieves $O(\log n + k)$ query time with $O(n\log n\log\log n)$ preprocessing and $O(n\log^2 n)$ space. The authors prove theoretical guarantees and validate them through experiments on two large real datasets, demonstrating clear speedups over the prior state of the art. The results have practical relevance for large-scale interval data in domains such as finance and transportation, and point to future work on dynamic intervals and continuous top-k queries.

Abstract

Intervals have been generated in many applications (e.g., temporal databases), and they are often associated with weights, such as prices. This paper addresses the problem of processing top-k weighted stabbing queries on interval data. Given a set of weighted intervals, a query value, and a result size $k$, this problem finds the $k$ intervals that are stabbed by the query value and have the largest weights. Although this problem finds practical applications (e.g., purchase, vehicle, and cryptocurrency analysis), it has not been well studied. A state-of-the-art algorithm for this problem incurs $O(n\log k)$ time, where $n$ is the number of intervals, so it is not scalable to large $n$. We solve this inefficiency issue and propose an algorithm that runs in $O(\sqrt{n }\log n + k)$ time. Furthermore, we propose an $O(\log n + k)$ algorithm to further accelerate the search efficiency. Experiments on two real large datasets demonstrate that our algorithms are faster than existing algorithms.

Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)

TL;DR

This paper tackles the problem of processing top-k weighted stabbing queries on static weighted intervals. It introduces two exact algorithms: Interval Forest, which achieves

query time with

space, and a Segment Tree variant ST-PSA, which achieves

query time with

preprocessing and

space. The authors prove theoretical guarantees and validate them through experiments on two large real datasets, demonstrating clear speedups over the prior state of the art. The results have practical relevance for large-scale interval data in domains such as finance and transportation, and point to future work on dynamic intervals and continuous top-k queries.

Abstract

, this problem finds the

intervals that are stabbed by the query value and have the largest weights. Although this problem finds practical applications (e.g., purchase, vehicle, and cryptocurrency analysis), it has not been well studied. A state-of-the-art algorithm for this problem incurs

time, where

is the number of intervals, so it is not scalable to large

. We solve this inefficiency issue and propose an algorithm that runs in

time. Furthermore, we propose an

algorithm to further accelerate the search efficiency. Experiments on two real large datasets demonstrate that our algorithms are faster than existing algorithms.

Paper Structure (19 sections, 9 theorems, 1 equation, 5 figures, 4 tables, 2 algorithms)

This paper contains 19 sections, 9 theorems, 1 equation, 5 figures, 4 tables, 2 algorithms.

Introduction
Motivation and Challenge
Contribution
Preliminary
Problem Definition
Interval Tree
Segment Tree
Algorithm based on Interval Forest
Data Structure and Construction
Query Processing Algorithm
Algorithm based on a Variant of Segment Tree
Variant of Segment Tree and Its Construction
Query Processing Algorithm
Experiment
Pre-processing Time
...and 4 more sections

Key Result

Lemma 1

An interval tree can be built in $O(n\log n)$ time, consumes $O(n)$ space, and processes a stabbing query in $O(\log n + m)$ time, where $m$ is the number of stabbed intervals.

Figures (5)

Figure 1: Example of the interval and segment tree structures. The red line represents a simple stabbing query $s$, and the traversed path is blue. Note that $x_{3}$ and $x_{6}$ are stabbed by the query.
Figure 2: Pre-processing time [sec] vs. dataset size
Figure 3: Memory usage [MB] vs. dataset size
Figure 4: Running time vs. $k$
Figure 5: Running time vs. data size

Theorems & Definitions (12)

Definition 1: Stabbing query
Definition 2: Top-k weighted stabbing query
Lemma 1
Lemma 2
Theorem 1
Corollary 1
Lemma 3
Theorem 2
Example 1
Lemma 4
...and 2 more

Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)

TL;DR

Abstract

Efficient Algorithms for Top-k Stabbing Queries on Weighted Interval Data (Full Version)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (12)