Table of Contents
Fetching ...

Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning

Zhaozhi Wang, Conghu Li, Qixiang Ye, Tong Zhang

TL;DR

The paper addresses the limitation that low-rank priors in parameter-efficient fine-tuning (PEFT) struggle to capture complex visual representations, especially when data contain high-rank or high-frequency components. It introduces MoPPA, a Mixture of Physical Priors Adapter that blends Heat, Wave, and Poisson priors implemented in the frequency domain via Discrete Cosine Transforms and an adaptive router, with a route-regularization term to prevent premature path dominance. The approach inserts MoPPA units before self-attention in vision transformers, forming a predictor $Y=\alpha_1\text{Heat}(X)+\alpha_2\text{Wave}(X)+\alpha_3\text{Poisson}(X)$ and optimizing learnable parameters $k$, $c$, $t$, and the Poisson source distribution, achieving state-of-the-art PEFT results across VTAB-1K, FGVC, and COCO with a small parameter budget. The results demonstrate robust transfer across backbones and pre-training schemes, highlighting physics-informed priors as a practical route to more expressive, parameter-efficient adaptation of large vision models.

Abstract

Most parameter-efficient fine-tuning (PEFT) methods rely on low-rank representations to adapt models. However, these approaches often oversimplify representations, particularly when the underlying data has high-rank or high-frequency components. This limitation hinders the model's ability to capture complex data interactions effectively. In this paper, we propose a novel approach that models network weights by leveraging a combination of physical priors, enabling more accurate approximations. We use three foundational equations -- heat diffusion, wave propagation, and Poisson's steady-state equation -- each contributing distinctive modeling properties: heat diffusion enforces local smoothness, wave propagation facilitates long-range interactions, and Poisson's equation captures global equilibrium. To combine these priors effectively, we introduce the Mixture of Physical Priors Adapter (MoPPA), using an efficient Discrete Cosine Transform (DCT) implementation. To dynamically balance these priors, a route regularization mechanism is designed to adaptively tune their contributions. MoPPA serves as a lightweight, plug-and-play module that seamlessly integrates into transformer architectures, with adaptable complexity depending on the local context. Specifically, using MAE pre-trained ViT-B, MoPPA improves PEFT accuracy by up to 2.1% on VTAB-1K image classification with a comparable number of trainable parameters, and advantages are further validated through experiments across various vision backbones, showcasing MoPPA's effectiveness and adaptability. The code will be made public available.

Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning

TL;DR

The paper addresses the limitation that low-rank priors in parameter-efficient fine-tuning (PEFT) struggle to capture complex visual representations, especially when data contain high-rank or high-frequency components. It introduces MoPPA, a Mixture of Physical Priors Adapter that blends Heat, Wave, and Poisson priors implemented in the frequency domain via Discrete Cosine Transforms and an adaptive router, with a route-regularization term to prevent premature path dominance. The approach inserts MoPPA units before self-attention in vision transformers, forming a predictor and optimizing learnable parameters , , , and the Poisson source distribution, achieving state-of-the-art PEFT results across VTAB-1K, FGVC, and COCO with a small parameter budget. The results demonstrate robust transfer across backbones and pre-training schemes, highlighting physics-informed priors as a practical route to more expressive, parameter-efficient adaptation of large vision models.

Abstract

Most parameter-efficient fine-tuning (PEFT) methods rely on low-rank representations to adapt models. However, these approaches often oversimplify representations, particularly when the underlying data has high-rank or high-frequency components. This limitation hinders the model's ability to capture complex data interactions effectively. In this paper, we propose a novel approach that models network weights by leveraging a combination of physical priors, enabling more accurate approximations. We use three foundational equations -- heat diffusion, wave propagation, and Poisson's steady-state equation -- each contributing distinctive modeling properties: heat diffusion enforces local smoothness, wave propagation facilitates long-range interactions, and Poisson's equation captures global equilibrium. To combine these priors effectively, we introduce the Mixture of Physical Priors Adapter (MoPPA), using an efficient Discrete Cosine Transform (DCT) implementation. To dynamically balance these priors, a route regularization mechanism is designed to adaptively tune their contributions. MoPPA serves as a lightweight, plug-and-play module that seamlessly integrates into transformer architectures, with adaptable complexity depending on the local context. Specifically, using MAE pre-trained ViT-B, MoPPA improves PEFT accuracy by up to 2.1% on VTAB-1K image classification with a comparable number of trainable parameters, and advantages are further validated through experiments across various vision backbones, showcasing MoPPA's effectiveness and adaptability. The code will be made public available.

Paper Structure

This paper contains 28 sections, 32 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Left: Comparison of VPT vpt, LoRA lora, and our proposed MoPPA. With lightweight operators incorporating physical priors, MoPPA enables parameter-efficient fine-tuning (PEFT) of pre-trained vision models from a fresh perspective. Right: Performance comparison on VTAB-1K of MAE mae pre-trained ViT-B. MoPPA achieves leading performance with comparable trainable parameters.
  • Figure 2: Visualization of the randomly generated Ground Truth (GT) and the absolute error between GT and regression results from LoRA / MoPPA. The results are averaged channel-wise. Please refer to Sec. \ref{['sec:analysis_implementation']} in the supplementary for details on the adaptation analysis implementation.
  • Figure 3: Visualization of diffusion processes of three physical equations. Left: Heat conduction from a central source. Mid: Wave propagation from an initial disturbance. Right: Potential field generated by Poisson's equation with a Dirac delta source in a half-space. More intense colors indicate higher temperatures, higher wave amplitudes, and higher potential values, respectively.
  • Figure 4: Architecture of a Vision Transformer (ViT) block with an integrated trainable MoPPA unit during fine-tuning. The trainable SD denotes the trainable Source Distribution, which serves as the input to Poisson's Equation. MLP refers to the Multi-Layer Perceptron. The snowflake and fire icons represent frozen and trainable modules, respectively.
  • Figure 5: The detailed implementation of a MoPPA unit as described in Eq. \ref{['eq:moppa_unit']}. As shown in the lower right portion, the arrows in purple/blue/orange represent $\textcolor{purple}{Heat(\cdot)}/\textcolor{blue1}{Wave(\cdot)}/\textcolor{orange}{Poisson(\cdot)}$, respectively.
  • ...and 2 more figures