Reversed Indexes $\approx$ Values in Wavelet Trees
Xiangjun Peng
TL;DR
The paper investigates bridging near-optimal lossless compression with the Leibniz Binary System through Wavelet Trees by aligning compressed indexes with a bit-reversed representation $[0,2^{N})$. It introduces the REVIVAL framework (Reversed Indexes = Values) and generalizes it via common bit subsequences (Reversed Indexes ≈ Values), enabling broader encodings and potential computation-on-compression. It discusses two perspectives—treating Wavelet Trees as a compression method and as a data-structure with RAM-model implications such as dual-addressing and level-based gathers, including Processing-In-Memory considerations. While the STOC 2024 submission was rejected for lacking new theorems, the work outlines concrete avenues for future theoretical and practical exploration of bit-pattern mappings and extensions to other data types.
Abstract
This work presents a discovery to advance the wisdom in a particular Succinct Data Structure: Wavelet Tree (Grossi, Gupta, and Vitter 2003). The discovery is first made by showing the feasibility of Reversed Indexes = Values: for integers within $[0,2^{N})$, there exists a Wavelet Tree that its compressed indexes can be equivalent to the Leibniz Binary system (Leibniz 1703), with only the bit reversal. Then we show how to strengthen the discovery by generalizing it into Reversed Indexes $\approx$ Values, by applying a longest common subsequence in bits and its patterns. Finally, we conjuncture potential implications of the above ideas by discussing its benefits, and modifications to the RAM model. The discovery reveals that: (1) the usability of Succinct Data Structure can be significantly expanded, by enabling Computation Directly on Compression; and (2) near-optimal lossless compression can still yield close connections with the Leibniz Binary System (Leibniz 1703), which breeds polymorphic functionalities within a single piece of the information. This work also provides an initial analysis of the benefits from the method (and potentially other extensions), and suggests potential modifications.
