Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction
Ethan G. Rogers, Cheng Wang
TL;DR
This work tackles the decoder compute bottleneck in neural image compression on edge devices. It introduces an end-to-end framework that learns a low-rank latent representation via vector quantization and uses a patch-based transformer encoder to replace heavy decoders. Low-rank reconstruction is performed iteratively with a target rank $R$ over $I$ steps, and the final output is an average of the per-iteration reconstructions, $\hat{\mathbf{T}}=\frac{1}{I}\sum_{i=0}^{I-1}\mathbf{T}_{i_R}$. Results show up to $21\times$ data-size reduction at an MSE around $3.6\times 10^{-3}$, with decoder MACs reduced by $10$–$100\times$, enabling practical edge deployment and potential extensions to high-resolution data and generative tasks.
Abstract
Image compression and reconstruction are crucial for various digital applications. While contemporary neural compression methods achieve impressive compression rates, the adoption of such technology has been largely hindered by the complexity and large computational costs of the convolution-based decoders during data reconstruction. To address the decoder bottleneck in neural compression, we develop a new compression-reconstruction framework based on incorporating low-rank representation in an autoencoder with vector quantization. We demonstrated that performing a series of computationally efficient low-rank operations on the learned latent representation of images can efficiently reconstruct the data with high quality. Our approach dramatically reduces the computational overhead in the decoding phase of neural compression/reconstruction, essentially eliminating the decoder compute bottleneck while maintaining high fidelity of image outputs.
