The Geometry of Cortical Computation: Manifold Disentanglement and Predictive Dynamics in VCNet
Brennen A. Hill, Zhang Xinyu, Timothy Putra Prasetio
TL;DR
The paper addresses CNN data inefficiency and brittleness by proposing VCNet, a cortex-inspired architecture that encodes geometric priors at a macro scale. It casts vision through two cortical streams that disentangle identity and pose on separate manifolds, and uses predictive coding as a geodesic refinement mechanism reinforced by recurrent dynamics, attention, lateral interactions, and neuromodulatory gating. Empirically, VCNet achieves strong accuracy and extreme parameter efficiency on Spots-10 ($=92.1\%$) and light-field classification ($=74.4\%$), outperforming compact baselines and demonstrating the benefits of brain-inspired geometric priors for robust representation learning. The work argues that high-level nervous-system principles, interpreted geometrically, offer a promising path toward more data-efficient and robust artificial vision systems, with potential extensions to equivariant micro-design, topology-informed analysis, and spatio-temporal processing.
Abstract
Despite their success, modern convolutional neural networks (CNNs) exhibit fundamental limitations, including data inefficiency, poor out-of-distribution generalization, and vulnerability to adversarial perturbations. These shortcomings can be traced to a lack of inductive biases that reflect the inherent geometric structure of the visual world. The primate visual system, in contrast, demonstrates superior efficiency and robustness, suggesting that its architectural and computational principles,which evolved to internalize these structures,may offer a blueprint for more capable artificial vision. This paper introduces Visual Cortex Network (VCNet), a novel neural network architecture whose design is informed by the macro-scale organization of the primate visual cortex. VCNet is framed as a geometric framework that emulates key biological mechanisms, including hierarchical processing across distinct cortical areas, dual-stream information segregation for learning disentangled representations, and top-down predictive feedback for representation refinement. We interpret these mechanisms through the lens of geometry and dynamical systems, positing that they guide the learning of structured, low-dimensional neural manifolds. We evaluate VCNet on two specialized benchmarks: the Spots-10 animal pattern dataset, which probes sensitivity to natural textures, and a light field image classification task, which requires processing higher-dimensional visual data. Our results show that VCNet achieves state-of-the-art accuracy of 92.1\% on Spots-10 and 74.4\% on the light field dataset, surpassing contemporary models of comparable size. This work demonstrates that integrating high-level neuroscientific principles, viewed through a geometric lens, can lead to more efficient and robust models, providing a promising direction for addressing long-standing challenges in machine learning.
