Linear Index for Logarithmic Search-Time for any String under any Internal Node in Suffix Trees
Anas Al-okaily
TL;DR
The paper tackles the challenge of efficiently searching for a pattern under internal nodes in suffix trees, a problem central to approximate pattern matching. It introduces OT_index, a linear-size index built from three sub-indexes (base paths, Hanadi nodes, Srivastava nodes) using OSHR-tree concepts, enabling $O(\log_2 n)$ search time via binary search on per-node lists. Key contributions include the formal definitions of Hanadi and Srivastava nodes, a concrete construction algorithm, and empirical evidence showing speedups over walking for depths $2$–$10$ with a total index size of $O(\Sigma n)$ (and $O(n)$ for Srivastava). This work offers a scalable, practical approach to enhance internal-node pattern searches in suffix trees, with direct implications for faster approximate pattern matching on large genomic data sets.
Abstract
Suffix trees are key and efficient data structure for solving string problems. A suffix tree is a compressed trie containing all the suffixes of a given text of length $n$ with a linear construction cost. In this work, we introduce an algorithm to build a linear index that allows finding a pattern of any length under any internal node in a suffix tree in O(logn) time.
