Table of Contents
Fetching ...

SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model's Parameter-Efficient Fine-Tuning

Gaole Dai, Chun-Kai Fan, Yiming Tang, Zhi Zhang, Yuan Zhang, Yulu Gan, Qizhe Zhang, Cheng-Ching Tseng, Shanghang Zhang, Tiejun Huang

TL;DR

SAN is a biologically inspired, plug-and-play PEFT method that decomposes and explicitly propagates scaling from earlier layers to downstream weights, enabling richer adaptation without extra trainable parameters. Grounded in Neural Engram and LTP/D principles, it provides a principled mechanism to control parameter-space shifts and improve stability via implicit regularization. Across vision, language, and visual-language tasks, SAN consistently outperforms FFT and LoRA baselines, with notable gains on ViT/Swin/ConvNeXt backbones and LLaMA/LLaVA models. The work highlights the practical impact of cross-layer scaling propagation for scalable, parameter-efficient fine-tuning and calls for deeper exploration of Neural Engram-inspired mechanisms in large models.

Abstract

Advances in Parameter-Efficient Fine-Tuning (PEFT) bridged the performance gap with Full Fine-Tuning (FFT) through sophisticated analysis of pre-trained parameter spaces. Starting from drawing insights from Neural Engrams (NE) in Biological Neural Networks (BNNs), we establish a connection between the low-rank property observed during PEFT's parameter space shifting and neurobiological mechanisms. This observation leads to our proposed method, Synapse and Neuron (SAN), which decomposes and propagates scaling components from anterior feature adjusting vectors towards posterior weight matrices. Our approach is theoretically grounded in Long-Term Potentiation/Depression (LTP/D) phenomena, which govern synapse development through neurotransmitter release modulation. Extensive experiments demonstrate its effectiveness: on \textbf{vision tasks} across VTAB, FGVC, and GIC (25 datasets) using ViT, SwinT and ConvNeXt, SAN outperforms FFT up to 8.7% and LoRA by 3.2%; on language tasks using Commonsense Reasoning (8 datasets) with LLaMA models (all generations), surpassing ChatGPT up to 8.5% and LoRA by 4.7%; on visual-language tasks using Mixed Visual Instruction (7 datasets) with LLaVA models, it exceeds FFT up to 2.4% and LoRA by 1.9%. Our code and W&B log will be released.

SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model's Parameter-Efficient Fine-Tuning

TL;DR

SAN is a biologically inspired, plug-and-play PEFT method that decomposes and explicitly propagates scaling from earlier layers to downstream weights, enabling richer adaptation without extra trainable parameters. Grounded in Neural Engram and LTP/D principles, it provides a principled mechanism to control parameter-space shifts and improve stability via implicit regularization. Across vision, language, and visual-language tasks, SAN consistently outperforms FFT and LoRA baselines, with notable gains on ViT/Swin/ConvNeXt backbones and LLaMA/LLaVA models. The work highlights the practical impact of cross-layer scaling propagation for scalable, parameter-efficient fine-tuning and calls for deeper exploration of Neural Engram-inspired mechanisms in large models.

Abstract

Advances in Parameter-Efficient Fine-Tuning (PEFT) bridged the performance gap with Full Fine-Tuning (FFT) through sophisticated analysis of pre-trained parameter spaces. Starting from drawing insights from Neural Engrams (NE) in Biological Neural Networks (BNNs), we establish a connection between the low-rank property observed during PEFT's parameter space shifting and neurobiological mechanisms. This observation leads to our proposed method, Synapse and Neuron (SAN), which decomposes and propagates scaling components from anterior feature adjusting vectors towards posterior weight matrices. Our approach is theoretically grounded in Long-Term Potentiation/Depression (LTP/D) phenomena, which govern synapse development through neurotransmitter release modulation. Extensive experiments demonstrate its effectiveness: on \textbf{vision tasks} across VTAB, FGVC, and GIC (25 datasets) using ViT, SwinT and ConvNeXt, SAN outperforms FFT up to 8.7% and LoRA by 3.2%; on language tasks using Commonsense Reasoning (8 datasets) with LLaMA models (all generations), surpassing ChatGPT up to 8.5% and LoRA by 4.7%; on visual-language tasks using Mixed Visual Instruction (7 datasets) with LLaVA models, it exceeds FFT up to 2.4% and LoRA by 1.9%. Our code and W&B log will be released.
Paper Structure (37 sections, 19 equations, 10 figures, 8 tables)

This paper contains 37 sections, 19 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Plug-and-Play SAN pipeline. The core design of SAN involves extracting the scaling component by the decomposition of adjusting vectors from preceding layers and reapplying it to subsequent layers. This approach simplifies learning objectives and enhances expressiveness by providing scaling priors without introducing additional trainable parameters. SAN is compatible with various PEFT techniques, such as LoRA.
  • Figure 2: Motivations and observations of decompose & propagate in SAN. Left panels: We illustrate the concepts of Neural Engram (NE) and Long-term Potentiation/Depression (LTP/D). Right panel: we applied PEFT on pre-trained ViT-B to two VTAB subsets. This involved scaling the features using layer-wise trainable scaling vectors, akin to SSF. QKV and FFN-L1/L2 represent the Attention layer and Feed Forward Network layer 1&2, respectively. The dotted and solid lines indicate the histogram of scaling vectors from different layers with the presence or absence of SAN, respectively. Analysis of those vectors reveals significant intra-task similarities but marked inter-task differences, consistent with the principles of NE. Moreover, SAN effectively controls the variance ($\sigma^{2}$) of the scaling vectors, thereby allowing for more nuanced adjustments and mitigating limitations in expressiveness, which aligns with the mechanisms of LTP/D.
  • Figure 3: SAN's explicit propagation mechanism. Unlike traditional PEFT methods (e.g., SSF) that only model the shifting of the current layer (top), SAN leverages propagation to effectively adapt pre-trained parameters across layers (bottom) without introducing additional trainable parameters. The scaling vectors ($\gamma$) learned in the anterior layer are explicitly propagated to influence the pre-trained weights of posterior layers, enabling more comprehensive parameter adaptation while maintaining parameter efficiency. Gray blocks represent pre-trained frozen weights, Color blocks indicate fine-tuned trainable weights, and arrows show the adjusting direction (i.e. reparameterization direction).
  • Figure 4: Performance comparison on FGVC with different backbones. Results show accuracy (%) for SAN and various baseline methods across different vision backbones
  • Figure 5: Unstable performance when ignoring propagation order. Blue and red bars represent positive and negative effects, respectively. Colour saturation indicates the relative gain intensity.
  • ...and 5 more figures