Table of Contents
Fetching ...

Assessing Open-world Forgetting in Generative Image Model Customization

Héctor Laria, Alex Gomez-Villa, Kai Wang, Bogdan Raducanu, Joost van de Weijer

TL;DR

This work defines open-world forgetting in diffusion-model customization, demonstrating that even tiny fine-tuning updates can cause substantial semantic and appearance drift across a model's broad latent space. It develops evaluation methods based on zero-shot classification for semantic drift and distribution-based color metrics, notably the Color Drift Index (CDI), for appearance drift, and introduces a Drift Correction loss (\mathcal{L}_{DC}) to preserve prior capabilities while learning new concepts. Empirical results show significant semantic accuracy drops (worst-case >60%) without mitigation, which are substantially reduced by the proposed method, with DRIFT correction also preserving diversity and improving user-perceived fidelity. The findings highlight the need to account for open-world forgetting in model customization and offer a practical, reproducible framework and mitigation strategy to enhance the reliability of personalized diffusion models in real-world applications.

Abstract

Recent advances in diffusion models have significantly enhanced image generation capabilities. However, customizing these models with new classes often leads to unintended consequences that compromise their reliability. We introduce the concept of open-world forgetting to characterize the vast scope of these unintended alterations. Our work presents the first systematic investigation into open-world forgetting in diffusion models, focusing on semantic and appearance drift of representations. Using zero-shot classification, we demonstrate that even minor model adaptations can lead to significant semantic drift affecting areas far beyond newly introduced concepts, with accuracy drops of up to 60% on previously learned concepts. Our analysis of appearance drift reveals substantial changes in texture and color distributions of generated content. To address these issues, we propose a functional regularization strategy that effectively preserves original capabilities while accommodating new concepts. Through extensive experiments across multiple datasets and evaluation metrics, we demonstrate that our approach significantly reduces both semantic and appearance drift. Our study highlights the importance of considering open-world forgetting in future research on model customization and finetuning methods.

Assessing Open-world Forgetting in Generative Image Model Customization

TL;DR

This work defines open-world forgetting in diffusion-model customization, demonstrating that even tiny fine-tuning updates can cause substantial semantic and appearance drift across a model's broad latent space. It develops evaluation methods based on zero-shot classification for semantic drift and distribution-based color metrics, notably the Color Drift Index (CDI), for appearance drift, and introduces a Drift Correction loss (\mathcal{L}_{DC}) to preserve prior capabilities while learning new concepts. Empirical results show significant semantic accuracy drops (worst-case >60%) without mitigation, which are substantially reduced by the proposed method, with DRIFT correction also preserving diversity and improving user-perceived fidelity. The findings highlight the need to account for open-world forgetting in model customization and offer a practical, reproducible framework and mitigation strategy to enhance the reliability of personalized diffusion models in real-world applications.

Abstract

Recent advances in diffusion models have significantly enhanced image generation capabilities. However, customizing these models with new classes often leads to unintended consequences that compromise their reliability. We introduce the concept of open-world forgetting to characterize the vast scope of these unintended alterations. Our work presents the first systematic investigation into open-world forgetting in diffusion models, focusing on semantic and appearance drift of representations. Using zero-shot classification, we demonstrate that even minor model adaptations can lead to significant semantic drift affecting areas far beyond newly introduced concepts, with accuracy drops of up to 60% on previously learned concepts. Our analysis of appearance drift reveals substantial changes in texture and color distributions of generated content. To address these issues, we propose a functional regularization strategy that effectively preserves original capabilities while accommodating new concepts. Through extensive experiments across multiple datasets and evaluation metrics, we demonstrate that our approach significantly reduces both semantic and appearance drift. Our study highlights the importance of considering open-world forgetting in future research on model customization and finetuning methods.

Paper Structure

This paper contains 37 sections, 6 equations, 16 figures, 9 tables.

Figures (16)

  • Figure 1: Unintended consequences in diffusion model customization. Methods like Dreambooth lead to substantial drift in previously learned representations during the finetuning process even when adapting to as few as five images: a) Appearance drift: Columns demonstrate fine-grained class changes, complete object and scene shifts, and alterations in color (on both rows, images are generated from same seed). b) Semantic drift: finetuning negatively impacts the zero-shot classification capabilities of the models.
  • Figure 2: Similarity (measured as cosine distance in CLIP-I embedding space) between models before and after adaptation. Each curve represents one of the 10 models from the Customized Model Set. a) Results with DreamBooth adaptation (includes prior regularization). b) Results with DreamBooth with Drift Correction. For more results see Appendix \ref{['appx:more_samples']}.
  • Figure 3: Appearance drift as consequence of DreamBooth customization. a) chromaticity plot of pixels of three realization of the prompts ('photo of a car/cow') and the same seed with different models, namely b) the base model, c) model adapted to lighthouse and d) model adapted to bike.
  • Figure 4: Appearance drift as consequence of customization measured with (left) Color Drift Index (CDI) and (right) Kernel Inception Distance (KID). The orange and green line represent the distance between the base model and the customized model. The blue line is a control line, representing the distance between two sets of images generated from different seeds both with the base model. Lines close to the origin are better.
  • Figure 5: Similarity (measured as cosine distance in CLIP-I embedding space) and perceptual metrics between models before and after adaptation. For each concept trained, we evaluate closely related concepts to measure the local drift. a) Results with DreamBooth adaptation (includes prior regularization). b) Results with DreamBooth with Drift Correction. c) Color Drift Index (CDI) and Kernel Inception Distance (KID). For more results see Appendix \ref{['appx:more_samples']}.
  • ...and 11 more figures