Table of Contents
Fetching ...

KIND: Knowledge Integration and Diversion for Training Decomposable Models

Yucheng Xie, Fu Feng, Ruixiao Shi, Jing Wang, Yong Rui, Xin Geng

TL;DR

KIND rethinks pre-training by enforcing a decomposable weight structure through an SVD-based constraint, separating knowledge into class-agnostic learengenes and class-specific tailors via a class gate. This yields a flexible backbone that can be recombined to meet varying memory and compute constraints, while mitigating domain shifts by transferring only learengenes when needed. The approach is demonstrated on Diffusion Transformer backbones for class-conditioned image generation, achieving competitive performance with reduced resources and strong transfer efficiency across novel classes and large domain shifts. Overall, KIND introduces a principled objective for learnable, decomposable backbones and enables rapid, resource-aware deployment in diverse tasks and environments.

Abstract

Pre-trained models have become the preferred backbone due to the increasing complexity of model parameters. However, traditional pre-trained models often face deployment challenges due to their fixed sizes, and are prone to negative transfer when discrepancies arise between training tasks and target tasks. To address this, we propose KIND, a novel pre-training method designed to construct decomposable models. KIND integrates knowledge by incorporating Singular Value Decomposition (SVD) as a structural constraint, with each basic component represented as a combination of a column vector, singular value, and row vector from U, Σ, and V^\top matrices. These components are categorized into learngenes for encapsulating class-agnostic knowledge and tailors for capturing class-specific knowledge, with knowledge diversion facilitated by a class gate mechanism during training. Extensive experiments demonstrate that models pre-trained with KIND can be decomposed into learngenes and tailors, which can be adaptively recombined for diverse resource-constrained deployments. Moreover, for tasks with large domain shifts, transferring only learngenes with task-agnostic knowledge, when combined with randomly initialized tailors, effectively mitigates domain shifts. Code will be made available at https://github.com/Te4P0t/KIND.

KIND: Knowledge Integration and Diversion for Training Decomposable Models

TL;DR

KIND rethinks pre-training by enforcing a decomposable weight structure through an SVD-based constraint, separating knowledge into class-agnostic learengenes and class-specific tailors via a class gate. This yields a flexible backbone that can be recombined to meet varying memory and compute constraints, while mitigating domain shifts by transferring only learengenes when needed. The approach is demonstrated on Diffusion Transformer backbones for class-conditioned image generation, achieving competitive performance with reduced resources and strong transfer efficiency across novel classes and large domain shifts. Overall, KIND introduces a principled objective for learnable, decomposable backbones and enables rapid, resource-aware deployment in diverse tasks and environments.

Abstract

Pre-trained models have become the preferred backbone due to the increasing complexity of model parameters. However, traditional pre-trained models often face deployment challenges due to their fixed sizes, and are prone to negative transfer when discrepancies arise between training tasks and target tasks. To address this, we propose KIND, a novel pre-training method designed to construct decomposable models. KIND integrates knowledge by incorporating Singular Value Decomposition (SVD) as a structural constraint, with each basic component represented as a combination of a column vector, singular value, and row vector from U, Σ, and V^\top matrices. These components are categorized into learngenes for encapsulating class-agnostic knowledge and tailors for capturing class-specific knowledge, with knowledge diversion facilitated by a class gate mechanism during training. Extensive experiments demonstrate that models pre-trained with KIND can be decomposed into learngenes and tailors, which can be adaptively recombined for diverse resource-constrained deployments. Moreover, for tasks with large domain shifts, transferring only learngenes with task-agnostic knowledge, when combined with randomly initialized tailors, effectively mitigates domain shifts. Code will be made available at https://github.com/Te4P0t/KIND.
Paper Structure (28 sections, 8 equations, 15 figures, 11 tables, 1 algorithm)

This paper contains 28 sections, 8 equations, 15 figures, 11 tables, 1 algorithm.

Figures (15)

  • Figure 1: (a) Traditional pre-training prioritizes maximizing performance on training datasets, often producing fixed-size models and making them prone to negative transfer. In contrast, KIND redefines the training objective to pre-train models that are both structure- and knowledge-decomposable. (b) Consequently, KIND enables pre-trained models to be adaptively restructured, facilitating deployment in diverse resource-constrained scenarios. (c) Additionally, the task-agnostic knowledge encapsulated in learngenes can effectively mitigate domain shifts.
  • Figure 2: (a) For each weight matrix in DiTs, we integrate it into the product of matrices $U$, $\Sigma$ and $V^\top$, formally inspired by SVD. The components of these matrices are then explicitly partitioned into the learngenes and tailors, which encapsulate class-agnostic and class-specific knowledge, respectively. (b) Knowledge is diverted through a class gate ensuring each training image updates only the learngenes and their corresponding class-related tailors, so that the class-agnostic knowledge can be condensed into the learngenes, while knowledge specific to each class is diverted into corresponding tailors.
  • Figure 3: (a) For downstream tasks with pre-trained classes, it can directly select the tailors corresponding to the target classes while discarding unrelated ones. (b) When encountering tasks with large domain shifts, only the learngene is transferred, combined with randomly initialized tailors for class-specific fine-tuning.
  • Figure 4: Selected samples from tasks with novel classes, generated by KIND and other PEFT methods using the DiT-L/2 model, with a resolution of $256 \times 256$. All images are generated using a classifier-free guidance (cfg) scale of 3.0.
  • Figure 5: Selected samples from tasks with large domain shifts, generated by KIND and other PEFT methods using the DiT-L/2, with a resolution of $256 \times 256$. All images are generated using a classifier-free guidance (cfg) scale of 1.5.
  • ...and 10 more figures