Table of Contents
Fetching ...

Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction

Ethan G. Rogers, Cheng Wang

TL;DR

This work tackles the decoder compute bottleneck in neural image compression on edge devices. It introduces an end-to-end framework that learns a low-rank latent representation via vector quantization and uses a patch-based transformer encoder to replace heavy decoders. Low-rank reconstruction is performed iteratively with a target rank $R$ over $I$ steps, and the final output is an average of the per-iteration reconstructions, $\hat{\mathbf{T}}=\frac{1}{I}\sum_{i=0}^{I-1}\mathbf{T}_{i_R}$. Results show up to $21\times$ data-size reduction at an MSE around $3.6\times 10^{-3}$, with decoder MACs reduced by $10$–$100\times$, enabling practical edge deployment and potential extensions to high-resolution data and generative tasks.

Abstract

Image compression and reconstruction are crucial for various digital applications. While contemporary neural compression methods achieve impressive compression rates, the adoption of such technology has been largely hindered by the complexity and large computational costs of the convolution-based decoders during data reconstruction. To address the decoder bottleneck in neural compression, we develop a new compression-reconstruction framework based on incorporating low-rank representation in an autoencoder with vector quantization. We demonstrated that performing a series of computationally efficient low-rank operations on the learned latent representation of images can efficiently reconstruct the data with high quality. Our approach dramatically reduces the computational overhead in the decoding phase of neural compression/reconstruction, essentially eliminating the decoder compute bottleneck while maintaining high fidelity of image outputs.

Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction

TL;DR

This work tackles the decoder compute bottleneck in neural image compression on edge devices. It introduces an end-to-end framework that learns a low-rank latent representation via vector quantization and uses a patch-based transformer encoder to replace heavy decoders. Low-rank reconstruction is performed iteratively with a target rank over steps, and the final output is an average of the per-iteration reconstructions, . Results show up to data-size reduction at an MSE around , with decoder MACs reduced by , enabling practical edge deployment and potential extensions to high-resolution data and generative tasks.

Abstract

Image compression and reconstruction are crucial for various digital applications. While contemporary neural compression methods achieve impressive compression rates, the adoption of such technology has been largely hindered by the complexity and large computational costs of the convolution-based decoders during data reconstruction. To address the decoder bottleneck in neural compression, we develop a new compression-reconstruction framework based on incorporating low-rank representation in an autoencoder with vector quantization. We demonstrated that performing a series of computationally efficient low-rank operations on the learned latent representation of images can efficiently reconstruct the data with high quality. Our approach dramatically reduces the computational overhead in the decoding phase of neural compression/reconstruction, essentially eliminating the decoder compute bottleneck while maintaining high fidelity of image outputs.

Paper Structure

This paper contains 5 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: Data transmission and decode compute bottleneck.
  • Figure 2: A comparison between (a) standard VQVAE architectures and (b) our low-rank VQVAE architecture. A low-rank operation is visualized in the bottom right.
  • Figure 3: Visual comparison of various parameters across standard VQVAEs (b) and our proposed low-rank VQVAE (c). We adjust various parameters and observe both their visual quality, as well as bits-per-pixel (bpp), MSE, and computational overhead in multiply and accumulates (MACs). Datasets used are CelebA 64x64, CIFAR-10, and MNIST celebacifar10mnist.
  • Figure 4: Graph comparison of various models' compression (bpp) and their respective decoder overhead as measured in MACs. Reconstruction quality is reported in color scale as MSE.