Optimal Compression of Unit Norm Vectors in the High Distortion Regime
Heng Zhu, Avishek Ghosh, Arya Mazumdar
TL;DR
This work analyzes compressing a unit-norm vector in high dimensions under extreme distortion, focusing on worst-case inputs and randomized encoders. It derives fundamental lower bounds for biased ($\delta$-compressor) and unbiased ($\omega$-compressor) schemes and proposes practical, near-optimal compressors: Max Block Norm Quantization (MBNQ) for biased delta and Sparse Randomized Quantization Scheme (SRQS) for unbiased omega, alongside a Gaussian-codebook baseline for both settings. The results establish that $O(d\delta)$ bits are necessary for biased compression and $O(d/\omega)$ for unbiased compression, with practical schemes achieving these rates up to logarithmic factors. The findings have direct implications for reducing communication in federated and distributed learning by enabling near-optimal, scalable vector compression.
Abstract
Motivated by the need for communication-efficient distributed learning, we investigate the method for compressing a unit norm vector into the minimum number of bits, while still allowing for some acceptable level of distortion in recovery. This problem has been explored in the rate-distortion/covering code literature, but our focus is exclusively on the "high-distortion" regime. We approach this problem in a worst-case scenario, without any prior information on the vector, but allowing for the use of randomized compression maps. Our study considers both biased and unbiased compression methods and determines the optimal compression rates. It turns out that simple compression schemes are nearly optimal in this scenario. While the results are a mix of new and known, they are compiled in this paper for completeness.
