Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Saurav Jha, Shiqi Yang, Masato Ishii, Mengjie Zhao, Christian Simon, Muhammad Jehanzeb Mirza, Dong Gong, Lina Yao, Shusuke Takahashi, Yuki Mitsufuji
TL;DR
This work tackles continual personalization of text-to-image diffusion models under replay-free constraints by leveraging diffusion classifier (DC) scores as regularizers. It introduces two consolidation strategies: parameter-space consolidation using Elastic Weight Consolidation (EWC) in the LoRA space, and function-space consolidation via Diffusion Scores Consolidation (DSC) with double distillation guided by DC scores and diffusion noise predictions. Across diverse datasets and long task sequences, the proposed DC-based methods outperform baselines like C-LoRA and TI/CD, while incurring zero additional storage and minimal inference-time overhead. The approach significantly improves forgetting control and plasticity in sequential concept learning, with demonstrated compatibility with VeRA and multi-concept generation, offering a practical, scalable path for user-specific diffusion model personalization.
Abstract
Personalized text-to-image diffusion models have grown popular for their ability to efficiently acquire a new concept from user-defined text descriptions and a few images. However, in the real world, a user may wish to personalize a model on multiple concepts but one at a time, with no access to the data from previous concepts due to storage/privacy concerns. When faced with this continual learning (CL) setup, most personalization methods fail to find a balance between acquiring new concepts and retaining previous ones -- a challenge that continual personalization (CP) aims to solve. Inspired by the successful CL methods that rely on class-specific information for regularization, we resort to the inherent class-conditioned density estimates, also known as diffusion classifier (DC) scores, for continual personalization of text-to-image diffusion models. Namely, we propose using DC scores for regularizing the parameter-space and function-space of text-to-image diffusion models, to achieve continual personalization. Using several diverse evaluation setups, datasets, and metrics, we show that our proposed regularization-based CP methods outperform the state-of-the-art C-LoRA, and other baselines. Finally, by operating in the replay-free CL setup and on low-rank adapters, our method incurs zero storage and parameter overhead, respectively, over the state-of-the-art. Our project page: https://srvcodes.github.io/continual_personalization/
