Table of Contents
Fetching ...

Content-Style Learning from Unaligned Domains: Identifiability under Unknown Latent Dimensions

Sagar Shrestha, Xiao Fu

TL;DR

This work addresses identifiability of latent content $oldsymbol{c}$ and domain-specific style $oldsymbol{s}^{(n)}$ from unaligned multi-domain data, where latent dimensions may be unknown. It introduces cross-domain latent distribution matching (LDM) to prove identifiability under relaxed conditions, notably relaxing componentwise independence and requiring only two domains in some cases. The authors show that with sparsity constraints, identifiability holds even when latent dimensions are not known, and they reformulate LDM as a sparsity-regularized multi-domain GAN (MDGAN) that is computationally efficient. Empirical results on image translation and generation tasks demonstrate reliable content-style disentanglement, high style diversity, and competitive generation quality, validating the theoretical claims and offering a practical pathway for dimension-agnostic content-style learning.

Abstract

Understanding identifiability of latent content and style variables from unaligned multi-domain data is essential for tasks such as domain translation and data generation. Existing works on content-style identification were often developed under somewhat stringent conditions, e.g., that all latent components are mutually independent and that the dimensions of the content and style variables are known. We introduce a new analytical framework via cross-domain \textit{latent distribution matching} (LDM), which establishes content-style identifiability under substantially more relaxed conditions. Specifically, we show that restrictive assumptions such as component-wise independence of the latent variables can be removed. Most notably, we prove that prior knowledge of the content and style dimensions is not necessary for ensuring identifiability, if sparsity constraints are properly imposed onto the learned latent representations. Bypassing the knowledge of the exact latent dimension has been a longstanding aspiration in unsupervised representation learning -- our analysis is the first to underpin its theoretical and practical viability. On the implementation side, we recast the LDM formulation into a regularized multi-domain GAN loss with coupled latent variables. We show that the reformulation is equivalent to LDM under mild conditions -- yet requiring considerably less computational resource. Experiments corroborate with our theoretical claims.

Content-Style Learning from Unaligned Domains: Identifiability under Unknown Latent Dimensions

TL;DR

This work addresses identifiability of latent content and domain-specific style from unaligned multi-domain data, where latent dimensions may be unknown. It introduces cross-domain latent distribution matching (LDM) to prove identifiability under relaxed conditions, notably relaxing componentwise independence and requiring only two domains in some cases. The authors show that with sparsity constraints, identifiability holds even when latent dimensions are not known, and they reformulate LDM as a sparsity-regularized multi-domain GAN (MDGAN) that is computationally efficient. Empirical results on image translation and generation tasks demonstrate reliable content-style disentanglement, high style diversity, and competitive generation quality, validating the theoretical claims and offering a practical pathway for dimension-agnostic content-style learning.

Abstract

Understanding identifiability of latent content and style variables from unaligned multi-domain data is essential for tasks such as domain translation and data generation. Existing works on content-style identification were often developed under somewhat stringent conditions, e.g., that all latent components are mutually independent and that the dimensions of the content and style variables are known. We introduce a new analytical framework via cross-domain \textit{latent distribution matching} (LDM), which establishes content-style identifiability under substantially more relaxed conditions. Specifically, we show that restrictive assumptions such as component-wise independence of the latent variables can be removed. Most notably, we prove that prior knowledge of the content and style dimensions is not necessary for ensuring identifiability, if sparsity constraints are properly imposed onto the learned latent representations. Bypassing the knowledge of the exact latent dimension has been a longstanding aspiration in unsupervised representation learning -- our analysis is the first to underpin its theoretical and practical viability. On the implementation side, we recast the LDM formulation into a regularized multi-domain GAN loss with coupled latent variables. We show that the reformulation is equivalent to LDM under mild conditions -- yet requiring considerably less computational resource. Experiments corroborate with our theoretical claims.

Paper Structure

This paper contains 41 sections, 7 theorems, 57 equations, 16 figures, 4 tables.

Key Result

Theorem 3.3

Under Eq. eq:nonlinearmixture, suppose that Assumptions as:block-indep and assump:variability hold, and that the $\widehat{\bm f}$ is differentiable. Then, we have $\widehat{\boldsymbol{f}}_{\rm C}(\boldsymbol{x}^{(n)}) = \bm \gamma (\boldsymbol{c})$ and $\widehat{\boldsymbol{f}}_{\rm S}(\boldsymbol

Figures (16)

  • Figure 1: Cross-domain translation from source domain $s$ to target domain $t$.
  • Figure 2: Samples generated by learning content (pose of cat) and style (type of cat) from AFHQ.
  • Figure 3: Samples generated by combining the same content $\overline{\boldsymbol{c}}$ with $\bm s^{(n)}$ for various $n$'s in AFHQ and CelebA-HQ.
  • Figure 4: Translation by combining content (pose) randomly sampled styles from the dog domain.
  • Figure 6: Result of sample generation across the two domains when the content is fixed. Images in each column was generated using the same content (digit identity). The first row is expected to have various colors and the second various rotations. Ideally, the digits in the two rows of the same column should be the same.
  • ...and 11 more figures

Theorems & Definitions (10)

  • Theorem 3.3: Identifiability under Known Latent Dimensions
  • Theorem 3.4: Identifiability without Dimension Knowledge
  • Remark 4.1
  • Theorem 4.2
  • Theorem B.1: Identifiability from sturma2024unpaired
  • Theorem B.2: Identifiability from timilsina2024identifiable
  • Theorem B.3: Identifiability from xie2022multikong2022partial
  • proof
  • proof
  • Proposition E.1: Sec. A.5. "Effects of the Uniformity Loss", Proposition 5 zimmermann2021contrastive