ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale
Otmar Ertl
TL;DR
ExaLogLog introduces ELL, a generalized, space-efficient sketch for approximate distinct counting that preserves mergeability, idempotence, reproducibility, and reducibility while enabling constant-time inserts. By adopting a novel update-value distribution and ML estimation via Newton’s method (with a martingale option), it achieves a practical MVP as low as $3.67$ and a reported 43% space reduction over HyperLogLog for the same accuracy, extending scalability to exa-scale counts. The work also provides a sparse-mode representation using hash tokens that allows deferred allocation, along with a reference Java implementation and extensive experimental validation. Overall, ELL offers a versatile, scalable approach for distributed data stores and analytics that balances theoretical guarantees with real-world performance.
Abstract
This work introduces ExaLogLog, a new data structure for approximate distinct counting, which has the same practical properties as the popular HyperLogLog algorithm. It is commutative, idempotent, mergeable, reducible, has a constant-time insert operation, and supports distinct counts up to the exa-scale. At the same time, as theoretically derived and experimentally verified, it requires 43% less space to achieve the same estimation error.
