Table of Contents
Fetching ...

Few-Shot Domain Adaptation for Learned Image Compression

Tianyu Zhang, Haotian Zhang, Yuqi Li, Li Li, Dong Liu

TL;DR

This paper tackles the poor generalization of pre-trained learned image compression (LIC) models to out-of-domain images. It introduces a universal few-shot domain adaptation framework that injects compact adapters—Conv-Adapters for latent channel reallocation and LoRA-Adapters for the entropy model—into existing LIC architectures and trains them via a two-stage strategy using only a small target-domain sample set. The approach achieves RD performance close to H.266/VVC intra coding across multiple domains and LIC schemes, while transmitting less than 2% of parameters and incurring minimal decoding-time overhead; it even matches full-model finetuning performance with far fewer trainable parameters. These results suggest practical deployment of LIC in diverse real-world domains and demonstrate the viability of lightweight, plug-and-play adaptation for learned codecs across varied visual domains.

Abstract

Learned image compression (LIC) has achieved state-of-the-art rate-distortion performance, deemed promising for next-generation image compression techniques. However, pre-trained LIC models usually suffer from significant performance degradation when applied to out-of-training-domain images, implying their poor generalization capabilities. To tackle this problem, we propose a few-shot domain adaptation method for LIC by integrating plug-and-play adapters into pre-trained models. Drawing inspiration from the analogy between latent channels and frequency components, we examine domain gaps in LIC and observe that out-of-training-domain images disrupt pre-trained channel-wise decomposition. Consequently, we introduce a method for channel-wise re-allocation using convolution-based adapters and low-rank adapters, which are lightweight and compatible to mainstream LIC schemes. Extensive experiments across multiple domains and multiple representative LIC schemes demonstrate that our method significantly enhances pre-trained models, achieving comparable performance to H.266/VVC intra coding with merely 25 target-domain samples. Additionally, our method matches the performance of full-model finetune while transmitting fewer than $2\%$ of the parameters.

Few-Shot Domain Adaptation for Learned Image Compression

TL;DR

This paper tackles the poor generalization of pre-trained learned image compression (LIC) models to out-of-domain images. It introduces a universal few-shot domain adaptation framework that injects compact adapters—Conv-Adapters for latent channel reallocation and LoRA-Adapters for the entropy model—into existing LIC architectures and trains them via a two-stage strategy using only a small target-domain sample set. The approach achieves RD performance close to H.266/VVC intra coding across multiple domains and LIC schemes, while transmitting less than 2% of parameters and incurring minimal decoding-time overhead; it even matches full-model finetuning performance with far fewer trainable parameters. These results suggest practical deployment of LIC in diverse real-world domains and demonstrate the viability of lightweight, plug-and-play adaptation for learned codecs across varied visual domains.

Abstract

Learned image compression (LIC) has achieved state-of-the-art rate-distortion performance, deemed promising for next-generation image compression techniques. However, pre-trained LIC models usually suffer from significant performance degradation when applied to out-of-training-domain images, implying their poor generalization capabilities. To tackle this problem, we propose a few-shot domain adaptation method for LIC by integrating plug-and-play adapters into pre-trained models. Drawing inspiration from the analogy between latent channels and frequency components, we examine domain gaps in LIC and observe that out-of-training-domain images disrupt pre-trained channel-wise decomposition. Consequently, we introduce a method for channel-wise re-allocation using convolution-based adapters and low-rank adapters, which are lightweight and compatible to mainstream LIC schemes. Extensive experiments across multiple domains and multiple representative LIC schemes demonstrate that our method significantly enhances pre-trained models, achieving comparable performance to H.266/VVC intra coding with merely 25 target-domain samples. Additionally, our method matches the performance of full-model finetune while transmitting fewer than of the parameters.
Paper Structure (26 sections, 6 equations, 13 figures, 8 tables)

This paper contains 26 sections, 6 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: (a) BD-rate ($\downarrow$) of four advanced LIC models with or without our method on different domains. (b)-(e) Images from different domains have visible different characteristics.
  • Figure 2: Analogy between frequency components and channels of LIC latents. For one image, we perform Fast Fourier Transform (FFT) and then reconstruct from low-frequency (LF) or high-frequency (HF) components respectively. Similarly, we perform a learned analysis transform ($g_a$) and then reconstruct from high-energy (HE) or low-energy (LE) channels respectively. Please view on screen and zoom in to observe the reconstructions from HF/LE.
  • Figure 3: Following Fig. \ref{['relation']}, we explore the domain gaps by observing the channel-wise decomposition. The top two rows display in-domain natural images, while the bottom two rows show out-of-domain images. (a) Source image. (b) Spectrum (using FFT). (c, d) Reconstructions from different HE channels and LE channels respectively. Here the total number of channels is 320. Out-of-domain images have more HF components as shown in (b), as well as more information embedded into LE channels as shown in (d). Thus, our key idea is to re-allocate the information from LE channels to HE channels for out-of-domain images.
  • Figure 4: Deployment of our method on ELIC he2022elic. We denote $g_{ep}$ as the entropy parameters network, while the other notations follow the same explanations in he2022elic. As detailed on the right, Conv-Adapters are inserted serially after non-linear blocks in the transform, while LoRA-Adapters are added to the pre-trained weight matrix $W_{0}$ in $g_{ep}$. $W$ and $b$ are weight and bias of Conv-Adapter, respectively, while $A$ and $B$ are low-rank matrices. Only adapters are trainable.
  • Figure 5: Division of $g_{s}$ into four stacks. In the second stage, we only finetune adapters in Stack 4 for reconstruction.
  • ...and 8 more figures