Table of Contents
Fetching ...

RECOMBINER: Robust and Enhanced Compression with Bayesian Implicit Neural Representations

Jiajun He, Gergely Flamich, Zongyu Guo, José Miguel Hernández-Lobato

TL;DR

The proposed method, Robust and Enhanced COMBINER (RECOMBINER), achieves competitive results with the best INR-based methods and even outperforms autoencoder-based codecs on low-resolution images at low bitrates.

Abstract

COMpression with Bayesian Implicit NEural Representations (COMBINER) is a recent data compression method that addresses a key inefficiency of previous Implicit Neural Representation (INR)-based approaches: it avoids quantization and enables direct optimization of the rate-distortion performance. However, COMBINER still has significant limitations: 1) it uses factorized priors and posterior approximations that lack flexibility; 2) it cannot effectively adapt to local deviations from global patterns in the data; and 3) its performance can be susceptible to modeling choices and the variational parameters' initializations. Our proposed method, Robust and Enhanced COMBINER (RECOMBINER), addresses these issues by 1) enriching the variational approximation while retaining a low computational cost via a linear reparameterization of the INR weights, 2) augmenting our INRs with learnable positional encodings that enable them to adapt to local details and 3) splitting high-resolution data into patches to increase robustness and utilizing expressive hierarchical priors to capture dependency across patches. We conduct extensive experiments across several data modalities, showcasing that RECOMBINER achieves competitive results with the best INR-based methods and even outperforms autoencoder-based codecs on low-resolution images at low bitrates. Our PyTorch implementation is available at https://github.com/cambridge-mlg/RECOMBINER/.

RECOMBINER: Robust and Enhanced Compression with Bayesian Implicit Neural Representations

TL;DR

The proposed method, Robust and Enhanced COMBINER (RECOMBINER), achieves competitive results with the best INR-based methods and even outperforms autoencoder-based codecs on low-resolution images at low bitrates.

Abstract

COMpression with Bayesian Implicit NEural Representations (COMBINER) is a recent data compression method that addresses a key inefficiency of previous Implicit Neural Representation (INR)-based approaches: it avoids quantization and enables direct optimization of the rate-distortion performance. However, COMBINER still has significant limitations: 1) it uses factorized priors and posterior approximations that lack flexibility; 2) it cannot effectively adapt to local deviations from global patterns in the data; and 3) its performance can be susceptible to modeling choices and the variational parameters' initializations. Our proposed method, Robust and Enhanced COMBINER (RECOMBINER), addresses these issues by 1) enriching the variational approximation while retaining a low computational cost via a linear reparameterization of the INR weights, 2) augmenting our INRs with learnable positional encodings that enable them to adapt to local details and 3) splitting high-resolution data into patches to increase robustness and utilizing expressive hierarchical priors to capture dependency across patches. We conduct extensive experiments across several data modalities, showcasing that RECOMBINER achieves competitive results with the best INR-based methods and even outperforms autoencoder-based codecs on low-resolution images at low bitrates. Our PyTorch implementation is available at https://github.com/cambridge-mlg/RECOMBINER/.
Paper Structure (36 sections, 5 equations, 17 figures, 8 tables, 1 algorithm)

This paper contains 36 sections, 5 equations, 17 figures, 8 tables, 1 algorithm.

Figures (17)

  • Figure 1: Schematic of (a) combiner and (b) recombiner, our proposed method. See \ref{['sec:background', 'sec:methods']} for notation. As the inr's input, recombiner uses ${\mathbf{h}}_{\mathbf{z}}$ upsampled to pixel-wise positional encodings concatenated with Fourier embeddings. (c) A closer look at how recombiner maps ${\mathbf{h}}_{\mathbf{z}}$ to the inr input, taking images as an example. FE: Fourier embeddings; FC: fully connected layer.
  • Figure 2: Illustration of (a) the three-level hierarchical model and (b) our permutation strategy.
  • Figure 3: Quantitive evaluation and qualitative examples of recombiner on image, audio, video, and 3D protein structure. Kbps stands for kilobits per second, RMSD stands for Root Mean Square Deviation, and bpa stands for bits per atom. For all plots, we use solid lines to denote inr-based codecs, dotted lines to denote VAE-based codecs, and dashed lines to denote classical codecs.
  • Figure 4: Comparison between kodim24 details compressed with and without learnable positional encodings. (a)(b) have similar bitrates and (a)(c) have similar PSNRs.
  • Figure 5: (a) RD performances of combiner and recombiner with different numbers of hidden units. (b)(c) Ablation studies on CIFAR-10 and Kodak. LR: linear reparameterization; PE: positional encodings; HM: hierarchical model; RP: random permutation across patches. We describe the details of experimental settings for ablation studies in \ref{['appendix:ablation_study_settings']}.
  • ...and 12 more figures