Table of Contents
Fetching ...

TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition

Junbo Jacob Lian, Feng Xiong, Yujun Sun, Kaichen Ouyang, Mingyang Yu, Shengwei Fu, Zhong Rui, Zhang Yujun, Huiling Chen

TL;DR

This work introduces TwistNet-2D, a lightweight module that computes pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact and consistently surpasses both parameter-matched and substantially larger baselines.

Abstract

Second-order feature statistics are central to texture recognition, yet current methods face a fundamental tension: bilinear pooling and Gram matrices capture global channel correlations but collapse spatial structure, while self-attention models spatial context through weighted aggregation rather than explicit pairwise feature interactions. We introduce TwistNet-2D, a lightweight module that computes \emph{local} pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact. The core component, Spiral-Twisted Channel Interaction (STCI), shifts one feature map along a prescribed direction before element-wise channel multiplication, thereby capturing the cross-position co-occurrence patterns characteristic of structured and periodic textures. Aggregating four directional heads with learned channel reweighting and injecting the result through a sigmoid-gated residual path, \TwistNet incurs only 3.5% additional parameters and 2% additional FLOPs over ResNet-18, yet consistently surpasses both parameter-matched and substantially larger baselines -- including ConvNeXt, Swin Transformer, and hybrid CNN--Transformer architectures -- across four texture and fine-grained recognition benchmarks.

TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition

TL;DR

This work introduces TwistNet-2D, a lightweight module that computes pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact and consistently surpasses both parameter-matched and substantially larger baselines.

Abstract

Second-order feature statistics are central to texture recognition, yet current methods face a fundamental tension: bilinear pooling and Gram matrices capture global channel correlations but collapse spatial structure, while self-attention models spatial context through weighted aggregation rather than explicit pairwise feature interactions. We introduce TwistNet-2D, a lightweight module that computes \emph{local} pairwise channel products under directional spatial displacement, jointly encoding where features co-occur and how they interact. The core component, Spiral-Twisted Channel Interaction (STCI), shifts one feature map along a prescribed direction before element-wise channel multiplication, thereby capturing the cross-position co-occurrence patterns characteristic of structured and periodic textures. Aggregating four directional heads with learned channel reweighting and injecting the result through a sigmoid-gated residual path, \TwistNet incurs only 3.5% additional parameters and 2% additional FLOPs over ResNet-18, yet consistently surpasses both parameter-matched and substantially larger baselines -- including ConvNeXt, Swin Transformer, and hybrid CNN--Transformer architectures -- across four texture and fine-grained recognition benchmarks.
Paper Structure (50 sections, 13 equations, 6 figures, 3 tables)

This paper contains 50 sections, 13 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: TwistNet-2D architecture. (Top) TwistNet-2D-18 follows a ResNet-like structure with four stages; Stages 3--4 use TwistBlocks that inject second-order channel interactions. (Bottom) TwistBlock augments the standard residual block with a gated MH-STCI branch.
  • Figure 2: Why cross-position correlation? (a) Wood grain exhibits periodic stripe-brown alternation. (b)--(c) CNN extracts stripe detector $z_1$ and brown detector $z_2$. (d) Same-position product $z_1 \times z_2$ yields low response due to misaligned peaks. (e) Spiral Twist shifts $z_2$ by $\delta$; the cross-position product $z_1 \times \tilde{z}_2$ aligns peaks, capturing periodicity.
  • Figure 3: Single STCI head. Channel reduction, directional spiral twist, $\ell_2$ normalization, pairwise products, and output concatenation.
  • Figure 4: Multi-Head STCI (MH-STCI). Four directional heads are concatenated, reweighted by AIS, normalized, and projected to output channels.
  • Figure 5: Accuracy vs. parameters on DTD. TwistNet-18 achieves the highest accuracy among all models. Larger models (Group 2, ${\sim}$28M) suffer severe degradation without pretraining, underscoring the advantage of parameter-efficient designs in data-limited regimes.
  • ...and 1 more figures