Table of Contents
Fetching ...

Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration

Zheqi Lv, Keming Ye, Zishu Wei, Qi Tian, Shengyu Zhang, Wenqiao Zhang, Wenjie Wang, Kun Kuang, Tat-Seng Chua, Fei Wu

TL;DR

CKI introduces a compatibility-aware knowledge integration framework to directly optimize incompatible parameters without adding inference cost. It simultaneously evaluates local parameter uncertainty and global model information content to compute a per-parameter compatibility, then applies hard or soft splicing to fuse parameters from multiple pretrained models. The approach is validated on recommendation and language tasks, showing consistent improvements over pruning, averaging, and ensemble baselines and even enabling effective initialization with just one retraining epoch. CKI’s ability to leverage complementary strengths across models while preserving the original architecture makes it a practical and scalable solution for robust deployment under distribution shifts.

Abstract

Deep neural networks have become foundational to advancements in multiple domains, including recommendation systems, natural language processing, and so on. Despite their successes, these models often contain incompatible parameters that can be underutilized or detrimental to model performance, particularly when faced with specific, varying data distributions. Existing research excels in removing such parameters or merging the outputs of multiple different pretrained models. However, the former focuses on efficiency rather than performance, while the latter requires several times more computing and storage resources to support inference. In this paper, we set the goal to explicitly improve these incompatible parameters by leveraging the complementary strengths of different models, thereby directly enhancing the models without any additional parameters. Specifically, we propose Compatibility-aware Knowledge Integration (CKI), which consists of Parameter Compatibility Assessment and Parameter Splicing, which are used to evaluate the knowledge content of multiple models and integrate the knowledge into one model, respectively. The integrated model can be used directly for inference or for further fine-tuning. We conduct extensive experiments on various datasets for recommendation and language tasks, and the results show that Compatibility-aware Knowledge Integration can effectively optimize incompatible parameters under multiple tasks and settings to break through the training limit of the original model without increasing the inference cost.

Optimize Incompatible Parameters through Compatibility-aware Knowledge Integration

TL;DR

CKI introduces a compatibility-aware knowledge integration framework to directly optimize incompatible parameters without adding inference cost. It simultaneously evaluates local parameter uncertainty and global model information content to compute a per-parameter compatibility, then applies hard or soft splicing to fuse parameters from multiple pretrained models. The approach is validated on recommendation and language tasks, showing consistent improvements over pruning, averaging, and ensemble baselines and even enabling effective initialization with just one retraining epoch. CKI’s ability to leverage complementary strengths across models while preserving the original architecture makes it a practical and scalable solution for robust deployment under distribution shifts.

Abstract

Deep neural networks have become foundational to advancements in multiple domains, including recommendation systems, natural language processing, and so on. Despite their successes, these models often contain incompatible parameters that can be underutilized or detrimental to model performance, particularly when faced with specific, varying data distributions. Existing research excels in removing such parameters or merging the outputs of multiple different pretrained models. However, the former focuses on efficiency rather than performance, while the latter requires several times more computing and storage resources to support inference. In this paper, we set the goal to explicitly improve these incompatible parameters by leveraging the complementary strengths of different models, thereby directly enhancing the models without any additional parameters. Specifically, we propose Compatibility-aware Knowledge Integration (CKI), which consists of Parameter Compatibility Assessment and Parameter Splicing, which are used to evaluate the knowledge content of multiple models and integrate the knowledge into one model, respectively. The integrated model can be used directly for inference or for further fine-tuning. We conduct extensive experiments on various datasets for recommendation and language tasks, and the results show that Compatibility-aware Knowledge Integration can effectively optimize incompatible parameters under multiple tasks and settings to break through the training limit of the original model without increasing the inference cost.
Paper Structure (31 sections, 18 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 31 sections, 18 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) shows the Incompatible Parameter issue. (b) describes Model Pruning, which removes incompatible parameters from $M_A$. (c) presents output ensemble, which combines the inference results of $M_A$ and $M_B$ for a final result. (d) introduces CKI, which evaluates each parameter's compatibility in global and local views, then integrates the knowledge of $M_A$ and $M_B$ to get model $M_C$. (e) shows that CKI outperforms baselines in different scenarios.
  • Figure 2: Overview of the proposed CKI. Our CKI includes two parts: Parameter Compatibility Assessment and Parameter Splicing. (a) describes the Parameter Compatibility Assessment. It consists of 3 parts: (a1) Local-level Parameter Uncertainty Assessment, (a2) Global-level Model Information Content Assessment, and (a3) Dual-Perspective Parameter Compatibility Assessment. (b) describes the Parameter Splicing, which includes (b1) Hard Splicing and (b2) Soft Splicing. (c) describes the extension of CKI from 2 models to multiple models.
  • Figure 3: Performance comparison of the proposed method and baselines on recommendation task when the pre-trained model is a static models.
  • Figure 4: Performance comparison of the proposed method and baselines on recommendation task when pre-trained models include both static and dynamic models.
  • Figure 5: The impact of the number of integrated models.