Table of Contents
Fetching ...

Hadaptive-Net: Efficient Vision Models via Adaptive Cross-Hadamard Synergy

Xuyang Zhang, Xi Zhang, Liang Chen, Hao Shi, Qingshan Guo

TL;DR

The work tackles the need for faster yet accurate vision backbones by introducing Adaptive Cross-Hadamard (ACH), a learnable cross-channel Hadamard product that expands channels from $m$ to $n$ and integrates into a lightweight Hadaptive-Net backbone. It couples ACH with differentiable channel selection via a Gumbel-Topk mechanism and a specialized Cross-Hadamard normalization (CrossHadaNorm) to maintain stable statistics when operating in high-dimensional Hadamard space. The authors provide theoretical and empirical evidence that ACH improves representational capacity more efficiently than traditional depthwise separable convolutions, and demonstrate competitive accuracy with lower latency across CIFAR-100 and ImageNet-1k benchmarks, particularly when CUDA acceleration is utilized. The result is a practical framework for deploying fast, high-performing vision systems that leverage Hadamard-based channel interactions and adaptive feature reuse.

Abstract

Recent studies have revealed the immense potential of Hadamard product in enhancing network representational capacity and dimensional compression. However, despite its theoretical promise, this technique has not been systematically explored or effectively applied in practice, leaving its full capabilities underdeveloped. In this work, we first analyze and identify the advantages of Hadamard product over standard convolutional operations in cross-channel interaction and channel expansion. Building upon these insights, we propose a computationally efficient module: Adaptive Cross-Hadamard (ACH), which leverages adaptive cross-channel Hadamard products for high-dimensional channel expansion. Furthermore, we introduce Hadaptive-Net (Hadamard Adaptive Network), a lightweight network backbone for visual tasks, which is demonstrated through experiments that it achieves an unprecedented balance between inference speed and accuracy through our proposed module.

Hadaptive-Net: Efficient Vision Models via Adaptive Cross-Hadamard Synergy

TL;DR

The work tackles the need for faster yet accurate vision backbones by introducing Adaptive Cross-Hadamard (ACH), a learnable cross-channel Hadamard product that expands channels from to and integrates into a lightweight Hadaptive-Net backbone. It couples ACH with differentiable channel selection via a Gumbel-Topk mechanism and a specialized Cross-Hadamard normalization (CrossHadaNorm) to maintain stable statistics when operating in high-dimensional Hadamard space. The authors provide theoretical and empirical evidence that ACH improves representational capacity more efficiently than traditional depthwise separable convolutions, and demonstrate competitive accuracy with lower latency across CIFAR-100 and ImageNet-1k benchmarks, particularly when CUDA acceleration is utilized. The result is a practical framework for deploying fast, high-performing vision systems that leverage Hadamard-based channel interactions and adaptive feature reuse.

Abstract

Recent studies have revealed the immense potential of Hadamard product in enhancing network representational capacity and dimensional compression. However, despite its theoretical promise, this technique has not been systematically explored or effectively applied in practice, leaving its full capabilities underdeveloped. In this work, we first analyze and identify the advantages of Hadamard product over standard convolutional operations in cross-channel interaction and channel expansion. Building upon these insights, we propose a computationally efficient module: Adaptive Cross-Hadamard (ACH), which leverages adaptive cross-channel Hadamard products for high-dimensional channel expansion. Furthermore, we introduce Hadaptive-Net (Hadamard Adaptive Network), a lightweight network backbone for visual tasks, which is demonstrated through experiments that it achieves an unprecedented balance between inference speed and accuracy through our proposed module.

Paper Structure

This paper contains 13 sections, 15 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Classification accuracy vs. computational complexity on CIFAR-100.Krizhevsky09learningmultiple This diagram presents a comprehensive comparison of accuracy across different computational scales among Hadaptive-Net(Ours), MobileNetV3howard2019searching, and MobileNetV4qin2024mobilenetv4. Detailed experimental configurations, implementation specifics, and additional benchmarking results against other models are provided in Section \ref{['subsec:4-2']}.
  • Figure 2: Flow diagram of channel expansion algorithms. (a) Standard Convolution; (b) Depthwise Separable Convolution; (c) Adaptive Cross-Hadamard Module. All algorithms are designed to expand the channel dimension from m to n, followed by a $k\times k$ convolution operation.
  • Figure 3: Illustration of the ACH module. Input features $\mathbf{X}$ undergo linear transformation and batch normalization. An evaluation network generates channel-wise scores, with Gumbel-Topk sampling (training) or top-k selection (inference) determining active channels. Selected features $\mathbf{Z}$ undergo cross-Hadamard product, normalized using preceding statistics, then concatenated with original features.
  • Figure 4: Hadaptive-Net architecture overview. Hadaptive-Net adopts a hierarchical backbone architecture comprising a stem followed by four distinct stages. To implement Ghost and ACH module with adaptability, we desgin the Adaptive Bottleneck that can decide the expansion layer of the bottleneck manually. The network begins with a linear convolutional layer as the stem, followed by fixed two conventional convolutional layers in Stage 1 for initial feature extraction. Stage 2 incorporates two fixed Adaptive Bottlenecks utilizing Ghost module as expansion layers, enabling rapid downsampling. Stages 3 and 4 employ Ghost Ada.Bott. for downsampling layers and Hadamard Ada.Bott for repeated residual blocks, with particular emphasis on parameter concentration in Stage 3, following ConvNeXt's design philosophy. The kernel sizes progressively increase across stages, with non-downsampling layers configured as $1\times1$, $3\times3$, $5\times5$, and $7\times7$ respectively.
  • Figure 5: Illustration of component-wise ablation variations. (1) and (2) represent removal of pointwise convolution and evaluation network respectively. (3) represents the replacement of learnable selection with fixed channel combinations, and (4) represents the substitution of cross-Hadamard normalization with standard batch normalization.