Efficient Progressive Image Compression with Variance-aware Masking
Alberto Presta, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto, Pamela Cosman
TL;DR
The paper addresses scalable, progressive image compression by introducing a two-latent scheme with a base latent $y^{b}$ for the lowest quality and a top latent $y^{t}$ that supports higher qualities, with a residual $r^{t}=y^{t}-y^{b}$ used to progressively refine reconstructions. It proposes a lightweight variance-aware masking policy that partitions $r^{t}$ into complementary components for transmission at different quality levels, along with a Progressive Channel-wise Entropy Estimation Module (PCEEM) and Rate Enhancement Modules (REMs) to improve entropy estimation without adding new parameters. Key contributions include the base/top latent framework, the nonparametric masking strategy, the PCEEM architecture, and REMs that refine entropy parameters across quality checkpoints, yielding competitive rate-distortion performance while reducing decoding time and parameter counts. The approach enables efficient, scalable progressive decoding suitable for networks with fluctuating capacity and real-time constraints, with practical impact on streaming and adaptive image compression pipelines. $y^{b}$, $y^{t}$, and $r^{t}$ serve as the core constructs enabling quality-guided bitstreams, while $q ightarrow[0,100]$ governs the progressive reconstruction, all integrated through hyperprior modeling and channel-wise entropy estimation.
Abstract
Learned progressive image compression is gaining momentum as it allows improved image reconstruction as more bits are decoded at the receiver. We propose a progressive image compression method in which an image is first represented as a pair of base-quality and top-quality latent representations. Next, a residual latent representation is encoded as the element-wise difference between the top and base representations. Our scheme enables progressive image compression with element-wise granularity by introducing a masking system that ranks each element of the residual latent representation from most to least important, dividing it into complementary components, which can be transmitted separately to the decoder in order to obtain different reconstruction quality. The masking system does not add further parameters nor complexity. At the receiver, any elements of the top latent representation excluded from the transmitted components can be independently replaced with the mean predicted by the hyperprior architecture, ensuring reliable reconstructions at any intermediate quality level. We also introduced Rate Enhancement Modules (REMs), which refine the estimation of entropy parameters using already decoded components. We obtain results competitive with state-of-the-art competitors, while significantly reducing computational complexity, decoding time, and number of parameters.
