Learning Fourier shapes to probe the geometric world of deep neural networks
Jian Wang, Yixing Yong, Haixia Bi, Lijun He, Fan Li
TL;DR
The paper tackles how deep neural networks encode geometry by introducing a differentiable framework that learns shape-only representations via Fourier parameterization and a winding-number mapper to pixels. This enables three core capabilities: generating shapes that carry class-specific semantics, using shapes as high-fidelity, boundary-precise interpretability masks, and deploying shape-based adversarial attacks that generalize across detection and recognition tasks. Key findings show that shape alone can trigger high-confidence classifications across architectures, that shape masks isolate minimal salient regions with sharp boundaries outperforming Grad-CAM, and that optimized shapes can significantly degrade downstream detectors like YOLOv3, with attack strength scaling with shape complexity $K$ and transferring across models. Collectively, the work opens new avenues for probing, interpreting, and challenging machine perception through geometry, with potential extensions to data augmentation and 3D shapes for robust understanding of vision systems.
Abstract
While both shape and texture are fundamental to visual recognition, research on deep neural networks (DNNs) has predominantly focused on the latter, leaving their geometric understanding poorly probed. Here, we show: first, that optimized shapes can act as potent semantic carriers, generating high-confidence classifications from inputs defined purely by their geometry; second, that they are high-fidelity interpretability tools that precisely isolate a model's salient regions; and third, that they constitute a new, generalizable adversarial paradigm capable of deceiving downstream visual tasks. This is achieved through an end-to-end differentiable framework that unifies a powerful Fourier series to parameterize arbitrary shapes, a winding number-based mapping to translate them into the pixel grid required by DNNs, and signal energy constraints that enhance optimization efficiency while ensuring physically plausible shapes. Our work provides a versatile framework for probing the geometric world of DNNs and opens new frontiers for challenging and understanding machine perception.
