Table of Contents
Fetching ...

JoIN: Joint GANs Inversion for Intrinsic Image Decomposition

Viraj Shah, Svetlana Lazebnik, Julien Philip

TL;DR

JoIN tackles IID by decomposing an image via a bank of component-specific GAN priors and solving a joint inversion problem in latent space. Each intrinsic component (albedo, shading, specular) is generated by an independently trained GAN, while joint inversion enforces reconstruction consistency through a perceptual loss and a novel kNN-based latent regularization to preserve in-domain priors. The authors further bridge the Sim-to-Real gap with encoder-guided initialization and targeted generator fine-tuning (PTI) using a local discriminator loss, enabling real-image generalization from synthetic priors. Experiments on materials and faces demonstrate strong separation of components and plausible relighting, with ablations showing the benefits of independent priors, kNN loss, and PTI. While slower than feed-forward methods, JoIN offers modularity and adaptability to new forward models, making it a flexible framework for IID and related inverse problems.

Abstract

Intrinsic Image Decomposition (IID) is a challenging inverse problem that seeks to decompose a natural image into its underlying intrinsic components such as albedo and shading. While recent image decomposition methods rely on learning-based priors on these components, they often suffer from component cross-contamination owing to joint training of priors; or from Sim-to-Real gap since the priors trained on synthetic data are kept frozen during the inference on real images. In this work, we propose to solve the intrinsic image decomposition problem using a bank of Generative Adversarial Networks (GANs) as priors where each GAN is independently trained only on a single intrinsic component, providing stronger and more disentangled priors. At the core of our approach is the idea that the latent space of a GAN is a well-suited optimization domain to solve inverse problems. Given an input image, we propose to jointly invert the latent codes of a set of GANs and combine their outputs to reproduce the input. Contrary to all existing GAN inversion methods that are limited to inverting only a single GAN, our proposed approach, JoIN, is able to jointly invert multiple GANs using only a single image as supervision while still maintaining distribution priors of each intrinsic component. We show that our approach is modular, allowing various forward imaging models, and that it can successfully decompose both synthetic and real images. Further, taking inspiration from existing GAN inversion approaches, we allow for careful fine-tuning of the generator priors during the inference on real images. This way, our method is able to achieve excellent generalization on real images even though it uses only synthetic data to train the GAN priors. We demonstrate the success of our approach through exhaustive qualitative and quantitative evaluations and ablation studies on various datasets.

JoIN: Joint GANs Inversion for Intrinsic Image Decomposition

TL;DR

JoIN tackles IID by decomposing an image via a bank of component-specific GAN priors and solving a joint inversion problem in latent space. Each intrinsic component (albedo, shading, specular) is generated by an independently trained GAN, while joint inversion enforces reconstruction consistency through a perceptual loss and a novel kNN-based latent regularization to preserve in-domain priors. The authors further bridge the Sim-to-Real gap with encoder-guided initialization and targeted generator fine-tuning (PTI) using a local discriminator loss, enabling real-image generalization from synthetic priors. Experiments on materials and faces demonstrate strong separation of components and plausible relighting, with ablations showing the benefits of independent priors, kNN loss, and PTI. While slower than feed-forward methods, JoIN offers modularity and adaptability to new forward models, making it a flexible framework for IID and related inverse problems.

Abstract

Intrinsic Image Decomposition (IID) is a challenging inverse problem that seeks to decompose a natural image into its underlying intrinsic components such as albedo and shading. While recent image decomposition methods rely on learning-based priors on these components, they often suffer from component cross-contamination owing to joint training of priors; or from Sim-to-Real gap since the priors trained on synthetic data are kept frozen during the inference on real images. In this work, we propose to solve the intrinsic image decomposition problem using a bank of Generative Adversarial Networks (GANs) as priors where each GAN is independently trained only on a single intrinsic component, providing stronger and more disentangled priors. At the core of our approach is the idea that the latent space of a GAN is a well-suited optimization domain to solve inverse problems. Given an input image, we propose to jointly invert the latent codes of a set of GANs and combine their outputs to reproduce the input. Contrary to all existing GAN inversion methods that are limited to inverting only a single GAN, our proposed approach, JoIN, is able to jointly invert multiple GANs using only a single image as supervision while still maintaining distribution priors of each intrinsic component. We show that our approach is modular, allowing various forward imaging models, and that it can successfully decompose both synthetic and real images. Further, taking inspiration from existing GAN inversion approaches, we allow for careful fine-tuning of the generator priors during the inference on real images. This way, our method is able to achieve excellent generalization on real images even though it uses only synthetic data to train the GAN priors. We demonstrate the success of our approach through exhaustive qualitative and quantitative evaluations and ablation studies on various datasets.
Paper Structure (28 sections, 14 equations, 21 figures, 4 tables)

This paper contains 28 sections, 14 equations, 21 figures, 4 tables.

Figures (21)

  • Figure 1: Intrinsic Image Decomposition (IID) framework. We consider a commonly used image decomposition framework that aims at decomposing the natural image into its light independent (albedo), light dependent (shading), and optionally residual (specular) components. The forward model $f(\cdot)$ from the intrinsic components to the natural image is simply given by multiplication of albedo and shading with addition of specular as a residual (here, $sRGB(\cdot)$ indicates the standard tone-mapping operation). However, the inverse mapping of natural image to its intrinsic components is highly ill-posed inverse problem that we aim to solve.
  • Figure 1: Quantitative comparisons on synthetic materials testset.
  • Figure 2: Overview of our approach. Left: We use a bank of pre-trained GANs as a prior where each GAN is trained only on a single image component using synthetic data. We design the problem of decomposition as a joint GAN inversion problem on multiple GANs using the input image $I^*$ as the only supervision. Step 1: We aim to optimize the latent codes of each GAN in a way that after passing the outputs of individual GANs through forward mapping $f(\cdot)$, the resulting image estimate $\hat{I}$ resembles the input image $I^*$. Apart from using reconstruction loss, we propose to use kNN loss on the latent codes to strongly enforce the priors learnt by each GAN. Such approach leads to successful decomposition on synthetic data. Step 2: For decomposition on real images, we bridge the Sim-to-Real gap by carefully fine-tuning the individual GANs in a way that they can represent the real image features while still maintaining strong component-wise priors.
  • Figure 3: $\mathcal{W}-$space of a pre-trained GAN is non-isometric. The t-SNE 2D projection of $\mathcal{W}-$space of pre-trained albedo GAN confirms its non-isometric behavior as the distribution varies differently across the dimensions. Naively using the distance from the mean as a loss penalizes point $B$ more than point $A$, even though $B$ is well within the distribution unlike $A$. As a remedy, our proposed kNN loss uses the closeness of a point to its neighbors as a loss instead of relying on its distance from the distribution mean, promoting exploration of the entire space.
  • Figure 4: Comparison: in-domain vs. kNN loss.Left: In-domain loss attracts the estimate towards the center resulting in higher loss near the distribution boundary. Right: kNN loss ($k=5$ NN) attracts the estimate towards the nearby latents, allowing to better capture the diversity of the distribution while still keeping the estimate within the distribution. Dots represent randomly sampled latent vectors (=100) and the colors map the loss value.
  • ...and 16 more figures