Domain Adaptation for Learned Image Compression with Supervised Adapters
Alberto Presta, Gabriele Spadaro, Enzo Tartaglione, Attilio Fiandrotti, Marco Grangetto
TL;DR
This work tackles cross-domain robustness in Learned Image Compression (LIC) by introducing decoder-side, domain-specific adapters and a gate network that blends their outputs. It deploys $K+1$ residual adapters (one per domain: $K$ targets plus the source) and a gate producing $\mathbf{v} \in [0,1]^{K+1}$ to weight adapter contributions, while keeping the pre-trained encoder/decoder fixed. Adapters and gate are trained jointly with a loss $\mathcal{L} = \gamma \cdot \mathrm{MSE}(\mathbf{x}, \hat{\mathbf{x}}) + \mathrm{CE}(d, \mathbf{v})$, avoiding encoder refinement and encoder-side rate changes. Results show consistent rate–distortion gains on target domains with no forgetting of the source, and the gate demonstrates reasonable generalization to unseen domains, indicating scalable cross-domain LIC with modest overhead (~3.8M adapter params and ~184k gate params). The approach offers practical benefits for deploying LIC across diverse content without retraining the entire model or transmitting per-image adapters.
Abstract
In Learned Image Compression (LIC), a model is trained at encoding and decoding images sampled from a source domain, often outperforming traditional codecs on natural images; yet its performance may be far from optimal on images sampled from different domains. In this work, we tackle the problem of adapting a pre-trained model to multiple target domains by plugging into the decoder an adapter module for each of them, including the source one. Each adapter improves the decoder performance on a specific domain, without the model forgetting about the images seen at training time. A gate network computes the weights to optimally blend the contributions from the adapters when the bitstream is decoded. We experimentally validate our method over two state-of-the-art pre-trained models, observing improved rate-distortion efficiency on the target domains without penalties on the source domain. Furthermore, the gate's ability to find similarities with the learned target domains enables better encoding efficiency also for images outside them.
