Table of Contents
Fetching ...

Reversed Indexes $\approx$ Values in Wavelet Trees

Xiangjun Peng

TL;DR

The paper investigates bridging near-optimal lossless compression with the Leibniz Binary System through Wavelet Trees by aligning compressed indexes with a bit-reversed representation $[0,2^{N})$. It introduces the REVIVAL framework (Reversed Indexes = Values) and generalizes it via common bit subsequences (Reversed Indexes ≈ Values), enabling broader encodings and potential computation-on-compression. It discusses two perspectives—treating Wavelet Trees as a compression method and as a data-structure with RAM-model implications such as dual-addressing and level-based gathers, including Processing-In-Memory considerations. While the STOC 2024 submission was rejected for lacking new theorems, the work outlines concrete avenues for future theoretical and practical exploration of bit-pattern mappings and extensions to other data types.

Abstract

This work presents a discovery to advance the wisdom in a particular Succinct Data Structure: Wavelet Tree (Grossi, Gupta, and Vitter 2003). The discovery is first made by showing the feasibility of Reversed Indexes = Values: for integers within $[0,2^{N})$, there exists a Wavelet Tree that its compressed indexes can be equivalent to the Leibniz Binary system (Leibniz 1703), with only the bit reversal. Then we show how to strengthen the discovery by generalizing it into Reversed Indexes $\approx$ Values, by applying a longest common subsequence in bits and its patterns. Finally, we conjuncture potential implications of the above ideas by discussing its benefits, and modifications to the RAM model. The discovery reveals that: (1) the usability of Succinct Data Structure can be significantly expanded, by enabling Computation Directly on Compression; and (2) near-optimal lossless compression can still yield close connections with the Leibniz Binary System (Leibniz 1703), which breeds polymorphic functionalities within a single piece of the information. This work also provides an initial analysis of the benefits from the method (and potentially other extensions), and suggests potential modifications.

Reversed Indexes $\approx$ Values in Wavelet Trees

TL;DR

The paper investigates bridging near-optimal lossless compression with the Leibniz Binary System through Wavelet Trees by aligning compressed indexes with a bit-reversed representation . It introduces the REVIVAL framework (Reversed Indexes = Values) and generalizes it via common bit subsequences (Reversed Indexes ≈ Values), enabling broader encodings and potential computation-on-compression. It discusses two perspectives—treating Wavelet Trees as a compression method and as a data-structure with RAM-model implications such as dual-addressing and level-based gathers, including Processing-In-Memory considerations. While the STOC 2024 submission was rejected for lacking new theorems, the work outlines concrete avenues for future theoretical and practical exploration of bit-pattern mappings and extensions to other data types.

Abstract

This work presents a discovery to advance the wisdom in a particular Succinct Data Structure: Wavelet Tree (Grossi, Gupta, and Vitter 2003). The discovery is first made by showing the feasibility of Reversed Indexes = Values: for integers within , there exists a Wavelet Tree that its compressed indexes can be equivalent to the Leibniz Binary system (Leibniz 1703), with only the bit reversal. Then we show how to strengthen the discovery by generalizing it into Reversed Indexes Values, by applying a longest common subsequence in bits and its patterns. Finally, we conjuncture potential implications of the above ideas by discussing its benefits, and modifications to the RAM model. The discovery reveals that: (1) the usability of Succinct Data Structure can be significantly expanded, by enabling Computation Directly on Compression; and (2) near-optimal lossless compression can still yield close connections with the Leibniz Binary System (Leibniz 1703), which breeds polymorphic functionalities within a single piece of the information. This work also provides an initial analysis of the benefits from the method (and potentially other extensions), and suggests potential modifications.
Paper Structure (19 sections, 5 figures)

This paper contains 19 sections, 5 figures.

Figures (5)

  • Figure 1: An example Wavelet Tree of the string "abcdabcd".
  • Figure 2: The corresponding bitmap from Figure \ref{['fig:wt-example']}.
  • Figure 3: An example of Reversed Indexes $=$ Values using integers within $[0, 2^{2})$.
  • Figure 4: An example of Reversed Indexes $\approx$ Values using characters within $[D, G]$ in ASCII encoding. The highlighted "10001" is the shared common subsequence of all characters in bits.
  • Figure :