MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models
Jhon Lopez, Carlos Hinojosa, Henry Arguello, Bernard Ghanem
TL;DR
MambaStyle tackles the challenge of inverting real images into StyleGAN latent spaces with a method that balances reconstruction fidelity, editability, and computational efficiency. It introduces Vision State-Space Models (VSSMs) into a single-stage encoder to produce latent codes in $\mathcal{W}^{+}$ and spatial features, augmented by a Fuser that injects editing directions into feature maps for precise, localized edits. The architecture combines a Multi-scale Mamba-based Encoder with a Fuser and uses StyleGAN2 for synthesis, trained with a composite loss that enforces high-fidelity reconstruction and structured, editable transformations via $\mathcal{L}_{\text{rec}}, \mathcal{L}_{\text{perc}}, \mathcal{L}_{\text{id}}, \mathcal{L}_{\text{struct}}, \mathcal{L}_{\text{e}}$. Empirical results on CelebA-HQ and Stanford Cars show MambaStyle achieves superior inversion quality and editing performance while significantly reducing model complexity and inference time, enabling real-time applications. Overall, the work provides a scalable, efficient pathway for high-quality real-image editing with StyleGAN by leveraging VSSMs and targeted feature-level fusion.
Abstract
The task of inverting real images into StyleGAN's latent space to manipulate their attributes has been extensively studied. However, existing GAN inversion methods struggle to balance high reconstruction quality, effective editability, and computational efficiency. In this paper, we introduce MambaStyle, an efficient single-stage encoder-based approach for GAN inversion and editing that leverages vision state-space models (VSSMs) to address these challenges. Specifically, our approach integrates VSSMs within the proposed architecture, enabling high-quality image inversion and flexible editing with significantly fewer parameters and reduced computational complexity compared to state-of-the-art methods. Extensive experiments show that MambaStyle achieves a superior balance among inversion accuracy, editing quality, and computational efficiency. Notably, our method achieves superior inversion and editing results with reduced model complexity and faster inference, making it suitable for real-time applications.
