Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain
Trinity Chung, Yuchen Shen, Nathan C. L. Kong, Aran Nayebi
TL;DR
This work instrumentally bridges tactile neuroscience and embodied AI by training task-optimized temporal networks on biomechanically faithful tactile sequences from rodent whisker simulations. Using an Encoder-Attender-Decoder (EAD) framework, it shows ConvRNN encoders, particularly IntersectionRNN, offer superior tactile categorization and neural alignment with rodent somatosensory cortex, outperforming feedforward and state-space baselines. Self-supervised tactile learning, especially SimCLR with tactile augmentations, achieves neural fits comparable to supervised training, revealing ethologically relevant learning signals. The study quantifies inductive biases required for brain-like tactile representations and highlights recurrent architectures and tactile-specific SSL as keys for robust tactile perception in unstructured environments.
Abstract
Tactile sensing remains far less understood in neuroscience and less effective in artificial systems compared to more mature modalities such as vision and language. We bridge these gaps by introducing a novel Encoder-Attender-Decoder (EAD) framework to systematically explore the space of task-optimized temporal neural networks trained on realistic tactile input sequences from a customized rodent whisker-array simulator. We identify convolutional recurrent neural networks (ConvRNNs) as superior encoders to purely feedforward and state-space architectures for tactile categorization. Crucially, these ConvRNN-encoder-based EAD models achieve neural representations closely matching rodent somatosensory cortex, saturating the explainable neural variability and revealing a clear linear relationship between supervised categorization performance and neural alignment. Furthermore, contrastive self-supervised ConvRNN-encoder-based EADs, trained with tactile-specific augmentations, match supervised neural fits, serving as an ethologically-relevant, label-free proxy. For neuroscience, our findings highlight nonlinear recurrent processing as important for general-purpose tactile representations in somatosensory cortex, providing the first quantitative characterization of the underlying inductive biases in this system. For embodied AI, our results emphasize the importance of recurrent EAD architectures to handle realistic tactile inputs, along with tailored self-supervised learning methods for achieving robust tactile perception with the same type of sensors animals use to sense in unstructured environments.
