FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition
Chen Hu, Hanchi Ren, Jingjing Deng, Xianghua Xie, Xiaoke Ma
TL;DR
FissionVAE tackles federated image generation under non-IID data by decoupling the latent space into group-specific priors and by introducing group-specific decoder branches, optionally extended with hierarchical inference. The method mitigates latent-space conflicts and feature blending that arise from aggregating non-IID local VAEs, and it supports heterogeneous decoder architectures. Across Mixed MNIST and CHARM, decoupled latent spaces and group-specific decoders yield substantial improvements in generation quality (e.g., lower $\text{FID}$ and higher $\text{IS}$) over the baseline FedVAE, with priors for $z_1$ and hierarchical variants providing additional gains in complex domains. Overall, FissionVAE offers a scalable, privacy-preserving framework for high-fidelity federated image generation in highly heterogeneous data environments, with future work focusing on stability of heterogeneous decoders and cross-modality extensions.
Abstract
Federated learning is a machine learning paradigm that enables decentralized clients to collaboratively learn a shared model while keeping all the training data local. While considerable research has focused on federated image generation, particularly Generative Adversarial Networks, Variational Autoencoders have received less attention. In this paper, we address the challenges of non-IID (independently and identically distributed) data environments featuring multiple groups of images of different types. Non-IID data distributions can lead to difficulties in maintaining a consistent latent space and can also result in local generators with disparate texture features being blended during aggregation. We thereby introduce FissionVAE that decouples the latent space and constructs decoder branches tailored to individual client groups. This method allows for customized learning that aligns with the unique data distributions of each group. Additionally, we incorporate hierarchical VAEs and demonstrate the use of heterogeneous decoder architectures within FissionVAE. We also explore strategies for setting the latent prior distributions to enhance the decoupling process. To evaluate our approach, we assemble two composite datasets: the first combines MNIST and FashionMNIST; the second comprises RGB datasets of cartoon and human faces, wild animals, marine vessels, and remote sensing images. Our experiments demonstrate that FissionVAE greatly improves generation quality on these datasets compared to baseline federated VAE models.
