Table of Contents
Fetching ...

Designing Concise ConvNets with Columnar Stages

Ashish Kumar, Jaesik Park

TL;DR

CoSNet introduces a concise convolutional architecture built on Parallel Columnar Convolutions, Input Replication, and shallow/deep projections to achieve low depth, controlled parameter growth, and high computational density. By enforcing uniform kernel sizes and delaying fusion (Fuse-Once), CoSNet attains strong efficiency without relying on attention mechanisms. Across ImageNet and downstream tasks, CoSNet matches or surpasses many standard ConvNets and ViTs while using markedly fewer parameters and FLOPs, and with faster inference. This work highlights a practical path toward efficient CNNs that compete with transformer-based models in real-world deployment scenarios.

Abstract

In the era of vision Transformers, the recent success of VanillaNet shows the huge potential of simple and concise convolutional neural networks (ConvNets). Where such models mainly focus on runtime, it is also crucial to simultaneously focus on other aspects, e.g., FLOPs, parameters, etc, to strengthen their utility further. To this end, we introduce a refreshing ConvNet macro design called Columnar Stage Network (CoSNet). CoSNet has a systematically developed simple and concise structure, smaller depth, low parameter count, low FLOPs, and attention-less operations, well suited for resource-constrained deployment. The key novelty of CoSNet is deploying parallel convolutions with fewer kernels fed by input replication, using columnar stacking of these convolutions, and minimizing the use of 1x1 convolution layers. Our comprehensive evaluations show that CoSNet rivals many renowned ConvNets and Transformer designs under resource-constrained scenarios. Code: https://github.com/ashishkumar822/CoSNet

Designing Concise ConvNets with Columnar Stages

TL;DR

CoSNet introduces a concise convolutional architecture built on Parallel Columnar Convolutions, Input Replication, and shallow/deep projections to achieve low depth, controlled parameter growth, and high computational density. By enforcing uniform kernel sizes and delaying fusion (Fuse-Once), CoSNet attains strong efficiency without relying on attention mechanisms. Across ImageNet and downstream tasks, CoSNet matches or surpasses many standard ConvNets and ViTs while using markedly fewer parameters and FLOPs, and with faster inference. This work highlights a practical path toward efficient CNNs that compete with transformer-based models in real-world deployment scenarios.

Abstract

In the era of vision Transformers, the recent success of VanillaNet shows the huge potential of simple and concise convolutional neural networks (ConvNets). Where such models mainly focus on runtime, it is also crucial to simultaneously focus on other aspects, e.g., FLOPs, parameters, etc, to strengthen their utility further. To this end, we introduce a refreshing ConvNet macro design called Columnar Stage Network (CoSNet). CoSNet has a systematically developed simple and concise structure, smaller depth, low parameter count, low FLOPs, and attention-less operations, well suited for resource-constrained deployment. The key novelty of CoSNet is deploying parallel convolutions with fewer kernels fed by input replication, using columnar stacking of these convolutions, and minimizing the use of 1x1 convolution layers. Our comprehensive evaluations show that CoSNet rivals many renowned ConvNets and Transformer designs under resource-constrained scenarios. Code: https://github.com/ashishkumar822/CoSNet
Paper Structure (24 sections, 7 figures, 9 tables)

This paper contains 24 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Design of various representative architectures in the order of their development in the timeline from (a) to (e). Each graph represents a stage of a network operating at a particular resolution.
  • Figure 2: Design evolution flow of CoSNet-unit. (a) A ResNet resnet stage with three blocks. (b) removing all $1\times1$ convolutions except the first of the first block and the last of the last block. (c) detailed design of the CoSNet-unit by integrating our design ideas into '(b)', and (d) final optimized CoSNet-unit from an implementation viewpoint.
  • Figure 3: Illustration of Vanilla Frequent Fusion (left) (shufflenetv1, Figure \ref{['fig:stages_groupconv']}) and Pairwise Frequent Fusion (right).
  • Figure 4: Macro design of (a) existing networks e.g. repvggresnetconvnextresnext, and (b) CoSNet. CoSNet does not have blocks in its stages.
  • Figure 5: Comparing the proposed CoSNet with representative models. Models in 'and 'refers to CoSNet and existing models respectively. CoSNet has lower parameters, lower FLOPs, while depth of CoSNet is not unnecessarily large. The size of the circle is proportional to the parameter count.
  • ...and 2 more figures