Table of Contents
Fetching ...

KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters

Haiduo Huang, Yadong Zhang, Yinghui Xu, Pengju Ren

TL;DR

KernelDNA tackles the parameter and speed costs of dynamic convolutions by introducing cross-layer weight sharing with lightweight adapters. It decouples input-dependent dynamic routing from static kernel modulation, enabling child layers derived from a shared parent kernel without increasing the base convolution’s computational burden. Through channel, filter, and spatial attention implemented in a decoupled, plug-in adapter, KernelDNA delivers substantial parameter reductions while preserving hardware-friendly inference. Empirical results on ImageNet-1K and MS-COCO demonstrate state-of-the-art accuracy-efficiency trade-offs across multiple architectures, establishing KernelDNA as a practical approach for adaptive CNNs in real-world vision tasks.

Abstract

Dynamic convolution enhances model capacity by adaptively combining multiple kernels, yet faces critical trade-offs: prior works either (1) incur significant parameter overhead by scaling kernel numbers linearly, (2) compromise inference speed through complex kernel interactions, or (3) struggle to jointly optimize dynamic attention and static kernels. We observe that pre-trained Convolutional Neural Networks (CNNs) exhibit inter-layer redundancy akin to that in Large Language Models (LLMs). Specifically, dense convolutional layers can be efficiently replaced by derived "child" layers generated from a shared "parent" convolutional kernel through an adapter. To address these limitations and implement the weight-sharing mechanism, we propose a lightweight convolution kernel plug-in, named KernelDNA. It decouples kernel adaptation into input-dependent dynamic routing and pre-trained static modulation, ensuring both parameter efficiency and hardware-friendly inference. Unlike existing dynamic convolutions that expand parameters via multi-kernel ensembles, our method leverages cross-layer weight sharing and adapter-based modulation, enabling dynamic kernel specialization without altering the standard convolution structure. This design preserves the native computational efficiency of standard convolutions while enhancing representation power through input-adaptive kernel adjustments. Experiments on image classification and dense prediction tasks demonstrate that KernelDNA achieves a state-of-the-art accuracy-efficiency balance among dynamic convolution variants.

KernelDNA: Dynamic Kernel Sharing via Decoupled Naive Adapters

TL;DR

KernelDNA tackles the parameter and speed costs of dynamic convolutions by introducing cross-layer weight sharing with lightweight adapters. It decouples input-dependent dynamic routing from static kernel modulation, enabling child layers derived from a shared parent kernel without increasing the base convolution’s computational burden. Through channel, filter, and spatial attention implemented in a decoupled, plug-in adapter, KernelDNA delivers substantial parameter reductions while preserving hardware-friendly inference. Empirical results on ImageNet-1K and MS-COCO demonstrate state-of-the-art accuracy-efficiency trade-offs across multiple architectures, establishing KernelDNA as a practical approach for adaptive CNNs in real-world vision tasks.

Abstract

Dynamic convolution enhances model capacity by adaptively combining multiple kernels, yet faces critical trade-offs: prior works either (1) incur significant parameter overhead by scaling kernel numbers linearly, (2) compromise inference speed through complex kernel interactions, or (3) struggle to jointly optimize dynamic attention and static kernels. We observe that pre-trained Convolutional Neural Networks (CNNs) exhibit inter-layer redundancy akin to that in Large Language Models (LLMs). Specifically, dense convolutional layers can be efficiently replaced by derived "child" layers generated from a shared "parent" convolutional kernel through an adapter. To address these limitations and implement the weight-sharing mechanism, we propose a lightweight convolution kernel plug-in, named KernelDNA. It decouples kernel adaptation into input-dependent dynamic routing and pre-trained static modulation, ensuring both parameter efficiency and hardware-friendly inference. Unlike existing dynamic convolutions that expand parameters via multi-kernel ensembles, our method leverages cross-layer weight sharing and adapter-based modulation, enabling dynamic kernel specialization without altering the standard convolution structure. This design preserves the native computational efficiency of standard convolutions while enhancing representation power through input-adaptive kernel adjustments. Experiments on image classification and dense prediction tasks demonstrate that KernelDNA achieves a state-of-the-art accuracy-efficiency balance among dynamic convolution variants.

Paper Structure

This paper contains 18 sections, 4 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Linear CKA across layers of different models reveals a consistent grid pattern, derived from models trained on their original architectures. Only the results for Conv3$\times$3 are presented. Except for the last two layers and a few intermediate layers, the similarity between different layers is notably high.
  • Figure 2: Prior dynamic convolution methods are mainly classified into two categories: i.e., (b) and (c). (a) Standard convolution with fixed kernels. (b) Kernel Pool-based Yang2019Chen2020Li2022Li2024, which maintains a pool of convolutional kernels trained alongside other model parameters and are fixed after training, and (c) Neural Generator-based zhou2021decoupledLi2021he2023sdChen2025b, which employs a generator network or hybrid attention mechanisms to directly synthesize convolutional kernels from the input. (d) Our proposed KernelDNA approach with shared parent kernels and adapter-modulated child layers.
  • Figure 3: Comparison of dynamic convolution, where $*$ denotes the convolution operation.
  • Figure 4: Light-weight Input-dependent Adapter. Note that the channel attention is shared across all filters, and can be applied to the input feature map equivalently, which avoids the batch expansion of the kernel tensors.
  • Figure 5: Linear CKA between the original model and the model with KernelDNA, only the results for Conv3$\times$3 are presented.
  • ...and 1 more figures