Table of Contents
Fetching ...

Conditional Hallucinations for Image Compression

Till Aczel, Roger Wattenhofer

TL;DR

ConHa, a compression method that dynamically balances hallucination levels based on content, is proposed, train a model to predict user preferences on detail and hallucination levels and use this prediction to adjust the perceptual weight in the reconstruction loss.

Abstract

In lossy image compression, models face the challenge of either hallucinating details or generating out-of-distribution samples due to the information bottleneck. This implies that at times, introducing hallucinations is necessary to generate in-distribution samples. The optimal level of hallucination varies depending on image content, as humans are sensitive to small changes that alter the semantic meaning. We propose a novel compression method that dynamically balances the degree of hallucination based on content. We collect data and train a model to predict user preferences on hallucinations. By using this prediction to adjust the perceptual weight in the reconstruction loss, we develop a Conditionally Hallucinating compression model (ConHa) that outperforms state-of-the-art image compression methods. Code and images are available at https://polybox.ethz.ch/index.php/s/owS1k5JYs4KD4TA.

Conditional Hallucinations for Image Compression

TL;DR

ConHa, a compression method that dynamically balances hallucination levels based on content, is proposed, train a model to predict user preferences on detail and hallucination levels and use this prediction to adjust the perceptual weight in the reconstruction loss.

Abstract

In lossy image compression, models face the challenge of either hallucinating details or generating out-of-distribution samples due to the information bottleneck. This implies that at times, introducing hallucinations is necessary to generate in-distribution samples. The optimal level of hallucination varies depending on image content, as humans are sensitive to small changes that alter the semantic meaning. We propose a novel compression method that dynamically balances the degree of hallucination based on content. We collect data and train a model to predict user preferences on hallucinations. By using this prediction to adjust the perceptual weight in the reconstruction loss, we develop a Conditionally Hallucinating compression model (ConHa) that outperforms state-of-the-art image compression methods. Code and images are available at https://polybox.ethz.ch/index.php/s/owS1k5JYs4KD4TA.

Paper Structure

This paper contains 15 sections, 4 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: 1) Original image 2) compressed with an MSE distortion optimized model 3) compressed with a hallucinating MSE+GAN optimized model. For the left image containing text, hallucinations degrade the image quality. For the right image containing grass, an in-distribution sample with hallucinations produces a higher perceptual quality.
  • Figure 2: Samples from the CLIC 2024 image test set with $w$ as their x coordinate. Images predicted to perform better without hallucinations are on the left, while those predicted to excel with in-distribution samples are on the right.
  • Figure 3: Comparison of compression methods. Our approach represents a middle ground between the Hyperprior and HiFiC models. For images with pavement (first column), our model adds details to create in-distribution samples. It avoids excessive hallucination for images with small faces and text (middle column) or objects with straight edges (right column).
  • Figure 4: Bootstrapped Elo Scores box plot on the CLIC 2024 image test set. Each box represents the distribution of Elo Scores, with the horizontal line indicating the median, the box extending from the first quartile (Q1) to the third quartile (Q3), and the whiskers extending to 1.5 times the interquartile range (Q3-Q1). At low bitrates, ConHa (ours) is compared against three baselines: Hyperprior, HiFiC, and ConHa-fixed. For medium and high bitrates, ConHa is compared solely against the previous state-of-the-art, HiFiC. Remarkably, at all bitrates, ConHa consistently outperforms the baseline models, as evidenced by the higher median Elo Scores.
  • Figure 5: User study interface with instructions (top) and after closing the instructions (bottom).
  • ...and 4 more figures