Table of Contents
Fetching ...

Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression

Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu

TL;DR

This study has trained their model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss, to enhance the perceptual quality of image reconstructions and implemented a latent refinement process to generate content-aware latent codes.

Abstract

Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, color shifting, and texture loss, thereby compromising perceptual quality of images. To address these issues, this study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss, to enhance the perceptual quality of image reconstructions. Additionally, we have implemented a latent refinement process to generate content-aware latent codes. These codes adhere to bit-rate constraints, balance the trade-off between distortion and fidelity, and prioritize bit allocation to regions of greater importance. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression. On CLIC2024 validation set, our approach achieves a 62% bitrate saving compared to MS-ILLM under FID metric.

Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression

TL;DR

This study has trained their model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss, to enhance the perceptual quality of image reconstructions and implemented a latent refinement process to generate content-aware latent codes.

Abstract

Recent advancements in neural compression have surpassed traditional codecs in PSNR and MS-SSIM measurements. However, at low bit-rates, these methods can introduce visually displeasing artifacts, such as blurring, color shifting, and texture loss, thereby compromising perceptual quality of images. To address these issues, this study presents an enhanced neural compression method designed for optimal visual fidelity. We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss, to enhance the perceptual quality of image reconstructions. Additionally, we have implemented a latent refinement process to generate content-aware latent codes. These codes adhere to bit-rate constraints, balance the trade-off between distortion and fidelity, and prioritize bit allocation to regions of greater importance. Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression. On CLIC2024 validation set, our approach achieves a 62% bitrate saving compared to MS-ILLM under FID metric.
Paper Structure (11 sections, 7 equations, 4 figures, 1 table)

This paper contains 11 sections, 7 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Framework Overview: Our process begins with training a VAE-based compression model employing our semantic ensemble loss, which is subsequently refined using SGA and a computationally simplified loss.
  • Figure 2: Comparisons of methods across various distortion and statistical fidelity metrics for the CLIC 2024 validation set.
  • Figure 3: Visual comparisons using different methods at the same bitrate.
  • Figure 4: Qualitative comparison of refining latent representations using different sets of hyperparameters for ROI-based loss.