Designing Concise ConvNets with Columnar Stages

Ashish Kumar; Jaesik Park

Designing Concise ConvNets with Columnar Stages

Ashish Kumar, Jaesik Park

TL;DR

CoSNet introduces a concise convolutional architecture built on Parallel Columnar Convolutions, Input Replication, and shallow/deep projections to achieve low depth, controlled parameter growth, and high computational density. By enforcing uniform kernel sizes and delaying fusion (Fuse-Once), CoSNet attains strong efficiency without relying on attention mechanisms. Across ImageNet and downstream tasks, CoSNet matches or surpasses many standard ConvNets and ViTs while using markedly fewer parameters and FLOPs, and with faster inference. This work highlights a practical path toward efficient CNNs that compete with transformer-based models in real-world deployment scenarios.

Abstract

In the era of vision Transformers, the recent success of VanillaNet shows the huge potential of simple and concise convolutional neural networks (ConvNets). Where such models mainly focus on runtime, it is also crucial to simultaneously focus on other aspects, e.g., FLOPs, parameters, etc, to strengthen their utility further. To this end, we introduce a refreshing ConvNet macro design called Columnar Stage Network (CoSNet). CoSNet has a systematically developed simple and concise structure, smaller depth, low parameter count, low FLOPs, and attention-less operations, well suited for resource-constrained deployment. The key novelty of CoSNet is deploying parallel convolutions with fewer kernels fed by input replication, using columnar stacking of these convolutions, and minimizing the use of 1x1 convolution layers. Our comprehensive evaluations show that CoSNet rivals many renowned ConvNets and Transformer designs under resource-constrained scenarios. Code: https://github.com/ashishkumar822/CoSNet

Designing Concise ConvNets with Columnar Stages

TL;DR

Abstract

Paper Structure (24 sections, 7 figures, 9 tables)

This paper contains 24 sections, 7 figures, 9 tables.

Introduction
Related Work
Columnar Stage Network
Avoiding $1\times 1$ for Reducing Depth
Parallel Columnar Convolutions for Controlled Parameters.
Input Replication
Uniform Kernel Size for High Computational Density & Uniform Primitive Operations.
Batched Processing for Minimal Branching.
Fuse Once
Projections
CoSNet Instantiation
Experiments
Advanced ConvNets and Vision Transformers
Comparison with Standard ConvNets
Additional Experiments
...and 9 more sections

Figures (7)

Figure 1: Design of various representative architectures in the order of their development in the timeline from (a) to (e). Each graph represents a stage of a network operating at a particular resolution.
Figure 2: Design evolution flow of CoSNet-unit. (a) A ResNet resnet stage with three blocks. (b) removing all $1\times1$ convolutions except the first of the first block and the last of the last block. (c) detailed design of the CoSNet-unit by integrating our design ideas into '(b)', and (d) final optimized CoSNet-unit from an implementation viewpoint.
Figure 3: Illustration of Vanilla Frequent Fusion (left) (shufflenetv1, Figure \ref{['fig:stages_groupconv']}) and Pairwise Frequent Fusion (right).
Figure 4: Macro design of (a) existing networks e.g. repvggresnetconvnextresnext, and (b) CoSNet. CoSNet does not have blocks in its stages.
Figure 5: Comparing the proposed CoSNet with representative models. Models in 'and 'refers to CoSNet and existing models respectively. CoSNet has lower parameters, lower FLOPs, while depth of CoSNet is not unnecessarily large. The size of the circle is proportional to the parameter count.
...and 2 more figures

Designing Concise ConvNets with Columnar Stages

TL;DR

Abstract

Designing Concise ConvNets with Columnar Stages

Authors

TL;DR

Abstract

Table of Contents

Figures (7)