Parameters as Experts: Adapting Vision Models with Dynamic Parameter Routing
Meng Lou, Stanley Yu, Yizhou Yu
TL;DR
AdaRoute tackles the challenge of parameter-efficient fine-tuning for dense vision tasks by introducing a shared expert center with a lightweight dynamic router that assembles input-dependent projection weights. This MoE-inspired approach enables low-rank, input-conditioned adaptation and promotes cross-layer feature interaction through a shared parameter pool. The method incorporates dynamic multi-scale spatial mixing via depthwise convolutions and a spatial aggregation module, improving representation capacity without excessive parameter growth. Empirical results across semantic segmentation, object detection/instance segmentation, panoptic segmentation, and image classification show AdaRoute achieving state-of-the-art performance among PEFT methods and, in several cases, matching or surpassing full fine-tuning with only a small fraction of trainable parameters, highlighting its practical value for scalable model adaptation.
Abstract
Adapting pre-trained vision models using parameter-efficient fine-tuning (PEFT) remains challenging, as it aims to achieve performance comparable to full fine-tuning using a minimal number of trainable parameters. When applied to complex dense prediction tasks, existing methods exhibit limitations, including input-agnostic modeling and redundant cross-layer representations. To this end, we propose AdaRoute, a new adapter-style method featuring a simple mixture-of-experts (MoE) architecture. Specifically, we introduce shared expert centers, where each expert is a trainable parameter matrix. During a feedforward pass, each AdaRoute module in the network dynamically generates weight matrices tailored for the current module via a simple dynamic parameter routing mechanism, which selectively aggregates parameter matrices in the corresponding expert center. Dynamic weight matrices in AdaRoute modules facilitate low-rank adaptation in an input-dependent manner, thus generating more customized and powerful feature representations. Moreover, since AdaRoute modules across multiple network layers share the same expert center, they improve feature diversity by promoting implicit cross-layer feature interaction. Extensive experiments demonstrate the superiority of AdaRoute on diverse vision tasks, including semantic segmentation, object detection and instance segmentation, and panoptic segmentation. Code will be available at: https://bit.ly/3NZcr0H.
