Table of Contents
Fetching ...

SepVAE: a contrastive VAE to separate pathological patterns from healthy ones

Robin Louiset, Edouard Duchesnay, Antoine Grigis, Benoit Dufumier, Pietro Gori

TL;DR

CA-VAE aims to separate common factors between BG and TG from target-specific salient patterns. SepVAE advances this paradigm by integrating a salient-discriminability constraint and a mutual-information regularization between common and salient spaces within a two-encoder, single-decoder VAE, optimized via an ELBO that includes conditional reconstruction, priors for both spaces, a salient-classification term, and the MI penalty. The approach yields improved separation of pathological information from healthy variability, outperforming prior CA-VAE methods on CelebA and three medical-imaging tasks, and is supported by open-source code. The work further suggests extensions to multiple target datasets and theoretical identifiability considerations to strengthen interpretability and reliability of the learned factors.

Abstract

Contrastive Analysis VAE (CA-VAEs) is a family of Variational auto-encoders (VAEs) that aims at separating the common factors of variation between a background dataset (BG) (i.e., healthy subjects) and a target dataset (TG) (i.e., patients) from the ones that only exist in the target dataset. To do so, these methods separate the latent space into a set of salient features (i.e., proper to the target dataset) and a set of common features (i.e., exist in both datasets). Currently, all models fail to prevent the sharing of information between latent spaces effectively and to capture all salient factors of variation. To this end, we introduce two crucial regularization losses: a disentangling term between common and salient representations and a classification term between background and target samples in the salient space. We show a better performance than previous CA-VAEs methods on three medical applications and a natural images dataset (CelebA). Code and datasets are available on GitHub https://github.com/neurospin-projects/2023_rlouiset_sepvae.

SepVAE: a contrastive VAE to separate pathological patterns from healthy ones

TL;DR

CA-VAE aims to separate common factors between BG and TG from target-specific salient patterns. SepVAE advances this paradigm by integrating a salient-discriminability constraint and a mutual-information regularization between common and salient spaces within a two-encoder, single-decoder VAE, optimized via an ELBO that includes conditional reconstruction, priors for both spaces, a salient-classification term, and the MI penalty. The approach yields improved separation of pathological information from healthy variability, outperforming prior CA-VAE methods on CelebA and three medical-imaging tasks, and is supported by open-source code. The work further suggests extensions to multiple target datasets and theoretical identifiability considerations to strengthen interpretability and reliability of the learned factors.

Abstract

Contrastive Analysis VAE (CA-VAEs) is a family of Variational auto-encoders (VAEs) that aims at separating the common factors of variation between a background dataset (BG) (i.e., healthy subjects) and a target dataset (TG) (i.e., patients) from the ones that only exist in the target dataset. To do so, these methods separate the latent space into a set of salient features (i.e., proper to the target dataset) and a set of common features (i.e., exist in both datasets). Currently, all models fail to prevent the sharing of information between latent spaces effectively and to capture all salient factors of variation. To this end, we introduce two crucial regularization losses: a disentangling term between common and salient representations and a classification term between background and target samples in the salient space. We show a better performance than previous CA-VAEs methods on three medical applications and a natural images dataset (CelebA). Code and datasets are available on GitHub https://github.com/neurospin-projects/2023_rlouiset_sepvae.
Paper Structure (22 sections, 4 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 22 sections, 4 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: SepVAE reconstructions on Brats2021 dataset menze_multimodal_2015. (Middle) full reconstructions using the estimated common and salient latent vectors. (Right) common-only reconstructions using the estimated common latent vectors and fixing the salient factors to $s'$. The common latent variables encode the healthy factors of variability (e.g. : brain shape and aspect), while the salient factors encode the pathological patterns (e.g. : tumors), which are not visible in the right columns (common-only).
  • Figure 2: Illustration of SepVAE training. Target and background images are encoded with the same encoders $e_{\phi_s}$ and $e_{\phi_c}$. The first encoder $e_{\phi_s}$ estimates the salient factors of variation $s$ of the target samples ($y=1$). Background samples ($y=0$) salient space is set to an informationless value $s'=0$. The second encoder $e_{\phi_c}$ estimates the common factors $c$. Images are reconstructed using a single decoder $d_{\theta}$ fed with the concatenation of c and s.
  • Figure 3: Illustration of Mutual Information loss between the common and the salient space. Given two images $x_a$ and $x_b$, 4 sets of latents are computed: $c_a$ and $s_a$ latents of the image $a$, $c_b$ and $s_b$ latents of the image $b$. A non-linear MLP is independently trained with a binary cross-entropy loss to classify shuffled concatenations (i.e., from different images) with the label $0$ and concatenations of latents coming from the same image with label $1$. Then, during training, encoders should not to be able to identify whether a concatenation of latents belong to class $0$ (shuffled common and salient spaces) or class $1$ (common and salient spaces coming from the same image). We encourage that by minimizing $D_{KL}(p_{\phi_s, \phi_c}(c, s) || p_{\phi_c}(c) \times p_{\phi_s}(s))$.
  • Figure 4: CelebA accessories dataset. We used a train set of $20000$ images ($10000$ no accessories, $5000$ glasses, $5000$ hats) and an independent test set of $4000$ images ($2000$ no accessories, $1000$ glasses, $1000$ hats) and ran the experiment $5$ times to account for initialization uncertainty. Images were centered on the face and then resized to $64 \times 64$, pixels were normalized between $0$ and $1$.
  • Figure 5: SepVAE qualitative example on the CelebA with accessories dataset (BG = no accessories, TG = hats and glasses). (Middle, common+salient): Full reconstructions using the estimated common and salient factors. (Right, common only): Reconstruction using only the estimated common factors fixing the salient to $s'$. The salient latent variables capture the accessories (hats and glasses), which are target-specific patterns. The common latents capture the common attributes (e.g., identity, skin color).
  • ...and 3 more figures