Table of Contents
Fetching ...

Multi-concept Model Immunization through Differentiable Model Merging

Amber Yijia Zheng, Raymond A. Yeh

TL;DR

This work addresses the risk of misuse in open-sourced generative models by extending the model immunization paradigm to multiple concepts. It introduces MIMA, a multi-concept immunization framework that uses a differentiable model merging layer to combine concept-specific adaptations into a single immunized model, optimized via a bi-level objective that backpropagates through the Merge operation. The approach demonstrates consistent improvement over single-concept baselines across re-learning and personalization tasks, using multiple adaptation techniques (e.g., DreamBooth, Textual Inversion, LoRA, CustomDiffusion) and evaluation metrics (MSGR, MRSGR). The results suggest that end-to-end differentiable merging can robustly immunize pre-trained diffusion models against multiple harmful concepts while preserving useful capabilities for non-target concepts, potentially increasing the safety of released models in practice.

Abstract

Model immunization is an emerging direction that aims to mitigate the potential risk of misuse associated with open-sourced models and advancing adaptation methods. The idea is to make the released models' weights difficult to fine-tune on certain harmful applications, hence the name ``immunized''. Recent work on model immunization focuses on the single-concept setting. However, models need to be immunized against multiple concepts in real-world situations. To address this gap, we propose an immunization algorithm that, simultaneously, learns a single ``difficult initialization'' for adaptation methods over a set of concepts. We achieve this by incorporating a differentiable merging layer that combines a set of model weights adapted over multiple concepts. In our experiments, we demonstrate the effectiveness of multi-concept immunization by generalizing prior work's experiment setup of re-learning and personalization adaptation to multiple concepts.

Multi-concept Model Immunization through Differentiable Model Merging

TL;DR

This work addresses the risk of misuse in open-sourced generative models by extending the model immunization paradigm to multiple concepts. It introduces MIMA, a multi-concept immunization framework that uses a differentiable model merging layer to combine concept-specific adaptations into a single immunized model, optimized via a bi-level objective that backpropagates through the Merge operation. The approach demonstrates consistent improvement over single-concept baselines across re-learning and personalization tasks, using multiple adaptation techniques (e.g., DreamBooth, Textual Inversion, LoRA, CustomDiffusion) and evaluation metrics (MSGR, MRSGR). The results suggest that end-to-end differentiable merging can robustly immunize pre-trained diffusion models against multiple harmful concepts while preserving useful capabilities for non-target concepts, potentially increasing the safety of released models in practice.

Abstract

Model immunization is an emerging direction that aims to mitigate the potential risk of misuse associated with open-sourced models and advancing adaptation methods. The idea is to make the released models' weights difficult to fine-tune on certain harmful applications, hence the name ``immunized''. Recent work on model immunization focuses on the single-concept setting. However, models need to be immunized against multiple concepts in real-world situations. To address this gap, we propose an immunization algorithm that, simultaneously, learns a single ``difficult initialization'' for adaptation methods over a set of concepts. We achieve this by incorporating a differentiable merging layer that combines a set of model weights adapted over multiple concepts. In our experiments, we demonstrate the effectiveness of multi-concept immunization by generalizing prior work's experiment setup of re-learning and personalization adaptation to multiple concepts.

Paper Structure

This paper contains 13 sections, 20 equations, 17 figures, 6 tables, 1 algorithm.

Figures (17)

  • Figure 1: We propose MIMA an immunize algorithm that protects a model against the adaptation on harmful concepts. Here we show an experiment on immunization against the re-learning of multiple artistic styles and report the CLIP similarity between the generations and the target concept at different adaption steps. A lower CLIP similarity indicates a more effective immunization, as the images semantically differ more from the references. As can be seen, MIMA offers protection over all three concepts of Van Gogh, Monet, and Picasso. In comparison, IMMA zheng2023imma, designed to immunize over a single concept, only offers protection against the re-learning of Van Gogh.
  • Figure 2: Method overview. Left: MIMA is formulated as a bi-level optimization program. For the lower-level, we unroll loss $L$ for the copied weights of each concept. Next, we combine the individual weights $\theta'_{[n]}$ via our proposed Merge layer defined in Eq. \ref{['eq:my_merge']}. For the upper-level, we maximize the diffusion loss $L$ with respect to the parameters $\theta$ by backpropagating through $\theta'$. Right: During generation, a model $\theta^\star$ immunized with MIMA fails to be adapted by ${\mathcal{A}}$ on all of the target concepts, i.e., the generations do not contain good quality images of castles, glasses, or cars.
  • Figure 3: Similarity vs. epochs for LoRA on styles. Each row shows one metric. Models with MIMA achieve lower similarity throughout LoRA's steps. This means that on the target concepts, MIMA generates images less similar to the references.
  • Figure 4: Qualitative result of MIMA against re-learning artistic styles. Both Erased and MIMA are adapted to all three concepts on a single model.
  • Figure 5: CLIP and DINO similarity on personalization concepts. The gaps between the dashed line and solid lines show MSGR$\uparrow$(%) of different methods. That is, a larger gap indicates stronger immunization.
  • ...and 12 more figures