Mixture of Physical Priors Adapter for Parameter-Efficient Fine-Tuning
Zhaozhi Wang, Conghu Li, Qixiang Ye, Tong Zhang
TL;DR
The paper addresses the limitation that low-rank priors in parameter-efficient fine-tuning (PEFT) struggle to capture complex visual representations, especially when data contain high-rank or high-frequency components. It introduces MoPPA, a Mixture of Physical Priors Adapter that blends Heat, Wave, and Poisson priors implemented in the frequency domain via Discrete Cosine Transforms and an adaptive router, with a route-regularization term to prevent premature path dominance. The approach inserts MoPPA units before self-attention in vision transformers, forming a predictor $Y=\alpha_1\text{Heat}(X)+\alpha_2\text{Wave}(X)+\alpha_3\text{Poisson}(X)$ and optimizing learnable parameters $k$, $c$, $t$, and the Poisson source distribution, achieving state-of-the-art PEFT results across VTAB-1K, FGVC, and COCO with a small parameter budget. The results demonstrate robust transfer across backbones and pre-training schemes, highlighting physics-informed priors as a practical route to more expressive, parameter-efficient adaptation of large vision models.
Abstract
Most parameter-efficient fine-tuning (PEFT) methods rely on low-rank representations to adapt models. However, these approaches often oversimplify representations, particularly when the underlying data has high-rank or high-frequency components. This limitation hinders the model's ability to capture complex data interactions effectively. In this paper, we propose a novel approach that models network weights by leveraging a combination of physical priors, enabling more accurate approximations. We use three foundational equations -- heat diffusion, wave propagation, and Poisson's steady-state equation -- each contributing distinctive modeling properties: heat diffusion enforces local smoothness, wave propagation facilitates long-range interactions, and Poisson's equation captures global equilibrium. To combine these priors effectively, we introduce the Mixture of Physical Priors Adapter (MoPPA), using an efficient Discrete Cosine Transform (DCT) implementation. To dynamically balance these priors, a route regularization mechanism is designed to adaptively tune their contributions. MoPPA serves as a lightweight, plug-and-play module that seamlessly integrates into transformer architectures, with adaptable complexity depending on the local context. Specifically, using MAE pre-trained ViT-B, MoPPA improves PEFT accuracy by up to 2.1% on VTAB-1K image classification with a comparable number of trainable parameters, and advantages are further validated through experiments across various vision backbones, showcasing MoPPA's effectiveness and adaptability. The code will be made public available.
