Table of Contents
Fetching ...

SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration

Yuto Nakashima, Mingzhe Yang, Yukino Baba

TL;DR

This work addresses the challenge of generating user-preferred images from the high-dimensional StyleGAN latent space by introducing a swipe-to-compare interface. It combines PCA-based latent-space reduction (to form a manageable subspace) with preferential Bayesian optimization and a multi-armed bandit to dynamically identify the most relevant latent dimensions to explore, mapping results back to the full latent space for image synthesis. Through simulation and user experiments, the method demonstrates superior efficiency in converging to user-preferred images and reveals that user preferences can shift during comparisons, which the approach accommodates through adaptive exploration. The results support a smartphone-friendly, human-centered workflow for personalized image generation, with future work focusing on faster search and enhanced visualization of latent-manipulation changes.

Abstract

Generating preferred images using generative adversarial networks (GANs) is challenging owing to the high-dimensional nature of latent space. In this study, we propose a novel approach that uses simple user-swipe interactions to generate preferred images for users. To effectively explore the latent space with only swipe interactions, we apply principal component analysis to the latent space of the StyleGAN, creating meaningful subspaces. We use a multi-armed bandit algorithm to decide the dimensions to explore, focusing on the preferences of the user. Experiments show that our method is more efficient in generating preferred images than the baseline methods. Furthermore, changes in preferred images during image generation or the display of entirely different image styles were observed to provide new inspirations, subsequently altering user preferences. This highlights the dynamic nature of user preferences, which our proposed approach recognizes and enhances.

SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration

TL;DR

This work addresses the challenge of generating user-preferred images from the high-dimensional StyleGAN latent space by introducing a swipe-to-compare interface. It combines PCA-based latent-space reduction (to form a manageable subspace) with preferential Bayesian optimization and a multi-armed bandit to dynamically identify the most relevant latent dimensions to explore, mapping results back to the full latent space for image synthesis. Through simulation and user experiments, the method demonstrates superior efficiency in converging to user-preferred images and reveals that user preferences can shift during comparisons, which the approach accommodates through adaptive exploration. The results support a smartphone-friendly, human-centered workflow for personalized image generation, with future work focusing on faster search and enhanced visualization of latent-manipulation changes.

Abstract

Generating preferred images using generative adversarial networks (GANs) is challenging owing to the high-dimensional nature of latent space. In this study, we propose a novel approach that uses simple user-swipe interactions to generate preferred images for users. To effectively explore the latent space with only swipe interactions, we apply principal component analysis to the latent space of the StyleGAN, creating meaningful subspaces. We use a multi-armed bandit algorithm to decide the dimensions to explore, focusing on the preferences of the user. Experiments show that our method is more efficient in generating preferred images than the baseline methods. Furthermore, changes in preferred images during image generation or the display of entirely different image styles were observed to provide new inspirations, subsequently altering user preferences. This highlights the dynamic nature of user preferences, which our proposed approach recognizes and enhances.
Paper Structure (32 sections, 5 equations, 13 figures)

This paper contains 32 sections, 5 equations, 13 figures.

Figures (13)

  • Figure 1: Overview of image generation through swipes. This example illustrates how our system generates a lawyer avatar based on user swipes. By interpreting continuous swipe feedback, the system dynamically adjusts the displayed image to match user preferences. Internally, it evaluates dimensions such as age or the presence of glasses. For instance, in Iteration $2$, the system updates the age of the avatar based on estimated user preferences. By the third iteration, in response to feedback, glasses are added to the avatar.
  • Figure 2: Overview of the proposed method: We initially employ PCA to derive a subspace from the latent space of StyleGAN. Next, the system identifies a key dimension using a multi-armed bandit algorithm and performs Bayesian optimization within the latent space of this dimension. The resulting latent variables are transformed into an image and presented to the user. Based on user feedback, the model is updated accordingly.
  • Figure 3: Creating pairwise comparison for simulation experiments. There are two steps to create a pairwise comparison. First, the system obtains the embedding vectors of the target and generates images using Facenet. Second, the pairwise comparison results are created by taking the cosine similarity of these two vectors and comparing this similarity to the similarity in the previous iteration.
  • Figure 4: Trends in the similarity between the generated image and the target image for each $d^{\prime}$(averages of moving averages over $10$ target images).
  • Figure 5: Distribution of response to "Were you able to reach the preferred image efficiently?"
  • ...and 8 more figures