Table of Contents
Fetching ...

Generative Active Learning for Image Synthesis Personalization

Xulu Zhang, Wengyu Zhang, Xiao-Yong Wei, Jinlin Wu, Zhaoxiang Zhang, Zhen Lei, Qing Li

TL;DR

This work explores applying active learning to generative image synthesis personalization (ISP) by converting open-ended querying into a semi-open, anchor-direction framework. It introduces a distribution-based uncertainty sampling strategy and a balancing mechanism to navigate exploitation-exploration in GAL, enabling efficient use of synthetic samples. Empirical results on style- and object-driven ISP show that GAL with uncertainty and balance can match or surpass state-of-the-art methods, including some closed-source approaches, while leveraging open-source diffusion models. The approach reduces annotation burden and demonstrates a practical path toward data-efficient, personalized image synthesis with generative models.

Abstract

This paper presents a pilot study that explores the application of active learning, traditionally studied in the context of discriminative models, to generative models. We specifically focus on image synthesis personalization tasks. The primary challenge in conducting active learning on generative models lies in the open-ended nature of querying, which differs from the closed form of querying in discriminative models that typically target a single concept. We introduce the concept of anchor directions to transform the querying process into a semi-open problem. We propose a direction-based uncertainty sampling strategy to enable generative active learning and tackle the exploitation-exploration dilemma. Extensive experiments are conducted to validate the effectiveness of our approach, demonstrating that an open-source model can achieve superior performance compared to closed-source models developed by large companies, such as Google's StyleDrop. The source code is available at https://github.com/zhangxulu1996/GAL4Personalization.

Generative Active Learning for Image Synthesis Personalization

TL;DR

This work explores applying active learning to generative image synthesis personalization (ISP) by converting open-ended querying into a semi-open, anchor-direction framework. It introduces a distribution-based uncertainty sampling strategy and a balancing mechanism to navigate exploitation-exploration in GAL, enabling efficient use of synthetic samples. Empirical results on style- and object-driven ISP show that GAL with uncertainty and balance can match or surpass state-of-the-art methods, including some closed-source approaches, while leveraging open-source diffusion models. The approach reduces annotation burden and demonstrates a practical path toward data-efficient, personalized image synthesis with generative models.

Abstract

This paper presents a pilot study that explores the application of active learning, traditionally studied in the context of discriminative models, to generative models. We specifically focus on image synthesis personalization tasks. The primary challenge in conducting active learning on generative models lies in the open-ended nature of querying, which differs from the closed form of querying in discriminative models that typically target a single concept. We introduce the concept of anchor directions to transform the querying process into a semi-open problem. We propose a direction-based uncertainty sampling strategy to enable generative active learning and tackle the exploitation-exploration dilemma. Extensive experiments are conducted to validate the effectiveness of our approach, demonstrating that an open-source model can achieve superior performance compared to closed-source models developed by large companies, such as Google's StyleDrop. The source code is available at https://github.com/zhangxulu1996/GAL4Personalization.
Paper Structure (23 sections, 9 equations, 15 figures, 6 tables, 1 algorithm)

This paper contains 23 sections, 9 equations, 15 figures, 6 tables, 1 algorithm.

Figures (15)

  • Figure 1: Overfitted and well-aligned generations. The model has to exclude the non-SoI for successful generations.
  • Figure 2: Examples of images generated by anchor prompts in round 2 with higher priority (left) and lower priority (right). Their CLIP image features are highlighted in the tSNE van2008visualizing space (middle). Poor-quality images that exhibit non-SoI are distributed near the reference, while high-quality images are located far from the reference.
  • Figure 3: Results of GAL over iterations. The images shown in the $1^{st}$ and $2^{nd}$ groups are for style- and object-driven ISP, respectively. The non-SoI and SoI are gradually disentangled and dragons or glasses are generated. Additional examples are available in the Appendix.
  • Figure 4: The curves shown in the figure resemble clock arms extending from the baseline performance points. As these arms move in an anti-clockwise direction towards the top-right corners, better performance is observed.
  • Figure 5: Qualitative comparison between our method and SOTA methods for personalized content generation. Our method produces text-aligned images compared with other methods. Additional comprehensive examples are available in the Appendix.
  • ...and 10 more figures