Table of Contents
Fetching ...

FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition

Chen Hu, Hanchi Ren, Jingjing Deng, Xianghua Xie, Xiaoke Ma

TL;DR

FissionVAE tackles federated image generation under non-IID data by decoupling the latent space into group-specific priors and by introducing group-specific decoder branches, optionally extended with hierarchical inference. The method mitigates latent-space conflicts and feature blending that arise from aggregating non-IID local VAEs, and it supports heterogeneous decoder architectures. Across Mixed MNIST and CHARM, decoupled latent spaces and group-specific decoders yield substantial improvements in generation quality (e.g., lower $\text{FID}$ and higher $\text{IS}$) over the baseline FedVAE, with priors for $z_1$ and hierarchical variants providing additional gains in complex domains. Overall, FissionVAE offers a scalable, privacy-preserving framework for high-fidelity federated image generation in highly heterogeneous data environments, with future work focusing on stability of heterogeneous decoders and cross-modality extensions.

Abstract

Federated learning is a machine learning paradigm that enables decentralized clients to collaboratively learn a shared model while keeping all the training data local. While considerable research has focused on federated image generation, particularly Generative Adversarial Networks, Variational Autoencoders have received less attention. In this paper, we address the challenges of non-IID (independently and identically distributed) data environments featuring multiple groups of images of different types. Non-IID data distributions can lead to difficulties in maintaining a consistent latent space and can also result in local generators with disparate texture features being blended during aggregation. We thereby introduce FissionVAE that decouples the latent space and constructs decoder branches tailored to individual client groups. This method allows for customized learning that aligns with the unique data distributions of each group. Additionally, we incorporate hierarchical VAEs and demonstrate the use of heterogeneous decoder architectures within FissionVAE. We also explore strategies for setting the latent prior distributions to enhance the decoupling process. To evaluate our approach, we assemble two composite datasets: the first combines MNIST and FashionMNIST; the second comprises RGB datasets of cartoon and human faces, wild animals, marine vessels, and remote sensing images. Our experiments demonstrate that FissionVAE greatly improves generation quality on these datasets compared to baseline federated VAE models.

FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition

TL;DR

FissionVAE tackles federated image generation under non-IID data by decoupling the latent space into group-specific priors and by introducing group-specific decoder branches, optionally extended with hierarchical inference. The method mitigates latent-space conflicts and feature blending that arise from aggregating non-IID local VAEs, and it supports heterogeneous decoder architectures. Across Mixed MNIST and CHARM, decoupled latent spaces and group-specific decoders yield substantial improvements in generation quality (e.g., lower and higher ) over the baseline FedVAE, with priors for and hierarchical variants providing additional gains in complex domains. Overall, FissionVAE offers a scalable, privacy-preserving framework for high-fidelity federated image generation in highly heterogeneous data environments, with future work focusing on stability of heterogeneous decoders and cross-modality extensions.

Abstract

Federated learning is a machine learning paradigm that enables decentralized clients to collaboratively learn a shared model while keeping all the training data local. While considerable research has focused on federated image generation, particularly Generative Adversarial Networks, Variational Autoencoders have received less attention. In this paper, we address the challenges of non-IID (independently and identically distributed) data environments featuring multiple groups of images of different types. Non-IID data distributions can lead to difficulties in maintaining a consistent latent space and can also result in local generators with disparate texture features being blended during aggregation. We thereby introduce FissionVAE that decouples the latent space and constructs decoder branches tailored to individual client groups. This method allows for customized learning that aligns with the unique data distributions of each group. Additionally, we incorporate hierarchical VAEs and demonstrate the use of heterogeneous decoder architectures within FissionVAE. We also explore strategies for setting the latent prior distributions to enhance the decoupling process. To evaluate our approach, we assemble two composite datasets: the first combines MNIST and FashionMNIST; the second comprises RGB datasets of cartoon and human faces, wild animals, marine vessels, and remote sensing images. Our experiments demonstrate that FissionVAE greatly improves generation quality on these datasets compared to baseline federated VAE models.
Paper Structure (20 sections, 11 equations, 11 figures, 6 tables)

This paper contains 20 sections, 11 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Qualitative results of the baseline FedVAE and proposed FissionVAEs. As we further decoupling the latent space and decoders in the federated environment, the quality of generated images is improved.
  • Figure 2: An illustration of baseline FedVAE. The encoder and the decoder of the VAE are aggregated through FedAvg regardless of their client groups.
  • Figure 3: An illustration of FissionVAE with Latent Space Decoupling. The latent variables are forced to follow their respective group prior distributions. The model is aggregated the same way as the baseline FedVAE.
  • Figure 4: An illustration of Hierarchical FissionVAE. This FissionVAE architecture extends to allow two levels of latent variables. The latent variable $z_1$ can be either learned or predefined. As input from different groups has been separated by $z_1$, the latent variable $z_2$ is set to follow the standard normal distribution.
  • Figure 5: An illustration of FissionVAE with Decoder Branch Decoupling. This FissionVAE creates decoders specific to client groups and enforces constraints for latent variable priors. The encoder is aggregated across groups while the group-specific decoder is only aggregated from local models within the corresponding group.
  • ...and 6 more figures