Shift-Equivariant Complex-Valued Convolutional Neural Networks
Quentin Gabot, Teck-Yian Lim, Jérémy Fix, Joana Frontera-Pons, Chengfang Ren, Jean-Philippe Ovarlez
TL;DR
This work extends provably shift-equivariant convolutional networks to complex-valued data, introducing a learnable complex-to-real projection before the Gumbel Softmax to enable end-to-end training of polyphase downsampling/upsampling in the complex domain. It establishes theoretical extensions and three key propositions for complex shift-equivariant design, and proposes both implicit and explicit projection strategies (e.g., PolyDec, MLP) to map $oldsymbol{z}\,\in\mathbb{C}^N$ to real logits. Empirically, complex-valued CVNNs with Learnable Polyphase Sampling (LPS) outperform real-valued and non-equivariant baselines on classification, segmentation, and reconstruction tasks on PolSAR datasets, with PolyDec often delivering a favorable balance between performance and compute. The results demonstrate that a properly engineered complex shift-equivariant framework can harness both amplitude and phase information in PolSAR data, yielding robust, invariant/equivariant representations for remote sensing applications.
Abstract
Convolutional neural networks have shown remarkable performance in recent years on various computer vision problems. However, the traditional convolutional neural network architecture lacks a critical property: shift equivariance and invariance, broken by downsampling and upsampling operations. Although data augmentation techniques can help the model learn the latter property empirically, a consistent and systematic way to achieve this goal is by designing downsampling and upsampling layers that theoretically guarantee these properties by construction. Adaptive Polyphase Sampling (APS) introduced the cornerstone for shift invariance, later extended to shift equivariance with Learnable Polyphase up/downsampling (LPS) applied to real-valued neural networks. In this paper, we extend the work on LPS to complex-valued neural networks both from a theoretical perspective and with a novel building block of a projection layer from $\mathbb{C}$ to $\mathbb{R}$ before the Gumbel Softmax. We finally evaluate this extension on several computer vision problems, specifically for either the invariance property in classification tasks or the equivariance property in both reconstruction and semantic segmentation problems, using polarimetric Synthetic Aperture Radar images.
