Table of Contents
Fetching ...

Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models

Yuqiao Tan, Shizhu He, Kang Liu, Jun Zhao

TL;DR

This work reframes knowledge transfer between cross-scale LLMs as Parametric Knowledge Transfer ($PKT$) and demonstrates that parametric Alignment is a prerequisite for effective transfer. It introduces Post-Align PKT and Pre-Align PKT, with LaTen (Locate-Then-Align) aiming to align parametric spaces using neuron-level localization and a lightweight hypernetwork, requiring minimal training. Across world knowledge, math, and code tasks, results show that unaligned transfers harm performance, while LaTen enables competitive gains and highlights Neural Incompatibility as a fundamental barrier due to ethological and parametric differences between models of different scales. The findings suggest promising directions for efficient PKT and deeper study of LLM parametric architectures, including robust, data-efficient approaches beyond language-guided supervision.

Abstract

Large Language Models (LLMs) offer a transparent brain with accessible parameters that encode extensive knowledge, which can be analyzed, located and transferred. Consequently, a key research challenge is to transcend traditional knowledge transfer paradigms rooted in symbolic language and achieve genuine Parametric Knowledge Transfer (PKT). Significantly, exploring effective methods for transferring knowledge across LLMs of different scales through parameters presents an intriguing and valuable research direction. In this paper, we first demonstrate $\textbf{Alignment}$ in parametric space is the fundamental prerequisite to achieve successful cross-scale PKT. We redefine the previously explored knowledge transfer as Post-Align PKT (PostPKT), which utilizes extracted parameters for LoRA initialization and requires subsequent fine-tune for alignment. Hence, to reduce cost for further fine-tuning, we introduce a novel Pre-Align PKT (PrePKT) paradigm and propose a solution called $\textbf{LaTen}$ ($\textbf{L}$oc$\textbf{a}$te-$\textbf{T}$h$\textbf{e}$n-Alig$\textbf{n}$) that aligns the parametric spaces of LLMs across scales only using several training steps without following training. Comprehensive experiments on four benchmarks demonstrate that both PostPKT and PrePKT face challenges in achieving consistently stable transfer. Through in-depth analysis, we identify $\textbf{Neural Incompatibility}$ as the ethological and parametric structural differences between LLMs of varying scales, presenting fundamental challenges to achieving effective PKT. These findings provide fresh insights into the parametric architectures of LLMs and highlight promising directions for future research on efficient PKT. Our code is available at https://github.com/Trae1ounG/Neural_Incompatibility.

Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models

TL;DR

This work reframes knowledge transfer between cross-scale LLMs as Parametric Knowledge Transfer () and demonstrates that parametric Alignment is a prerequisite for effective transfer. It introduces Post-Align PKT and Pre-Align PKT, with LaTen (Locate-Then-Align) aiming to align parametric spaces using neuron-level localization and a lightweight hypernetwork, requiring minimal training. Across world knowledge, math, and code tasks, results show that unaligned transfers harm performance, while LaTen enables competitive gains and highlights Neural Incompatibility as a fundamental barrier due to ethological and parametric differences between models of different scales. The findings suggest promising directions for efficient PKT and deeper study of LLM parametric architectures, including robust, data-efficient approaches beyond language-guided supervision.

Abstract

Large Language Models (LLMs) offer a transparent brain with accessible parameters that encode extensive knowledge, which can be analyzed, located and transferred. Consequently, a key research challenge is to transcend traditional knowledge transfer paradigms rooted in symbolic language and achieve genuine Parametric Knowledge Transfer (PKT). Significantly, exploring effective methods for transferring knowledge across LLMs of different scales through parameters presents an intriguing and valuable research direction. In this paper, we first demonstrate in parametric space is the fundamental prerequisite to achieve successful cross-scale PKT. We redefine the previously explored knowledge transfer as Post-Align PKT (PostPKT), which utilizes extracted parameters for LoRA initialization and requires subsequent fine-tune for alignment. Hence, to reduce cost for further fine-tuning, we introduce a novel Pre-Align PKT (PrePKT) paradigm and propose a solution called (octe-hn-Alig) that aligns the parametric spaces of LLMs across scales only using several training steps without following training. Comprehensive experiments on four benchmarks demonstrate that both PostPKT and PrePKT face challenges in achieving consistently stable transfer. Through in-depth analysis, we identify as the ethological and parametric structural differences between LLMs of varying scales, presenting fundamental challenges to achieving effective PKT. These findings provide fresh insights into the parametric architectures of LLMs and highlight promising directions for future research on efficient PKT. Our code is available at https://github.com/Trae1ounG/Neural_Incompatibility.

Paper Structure

This paper contains 29 sections, 15 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Different paradigms of knowledge transfer between cross-scale LLMs. Compared to human-like symbolic knowledge transfer based on language (as shown in (a)), we aspire for LLMs to achieve more efficient knowledge transfer leveraging knowledgeable parameter (as illustrated in (b)).
  • Figure 2: Performance of different baseline methods in MMLU.
  • Figure 3: Representation Similarity Comparison Results between LLMs.
  • Figure 4: Parametric Similarity Comparison Results between LLMs in MLP Modules.
  • Figure 5: Interpretable neuron location in GSM8K task.
  • ...and 4 more figures