Table of Contents
Fetching ...

TopoNets: High Performing Vision and Language Models with Brain-Like Topography

Mayukh Deb, Mainak Deb, N. Apurva Ratan Murty

TL;DR

TopoNets address the lack of brain-like topography in artificial networks by introducing TopoLoss, a brain-inspired inductive bias that reshapes weights into cortical sheets and enforces locality through a differentiable blurring objective. The approach generalizes across vision and language architectures (CNNs and transformers) and yields high task performance while producing brain-like, low-dimensional representations. Empirically, TopoNets outperform prior topo methods on ImageNet and BrainScore benchmarks, enable parameter-efficient representations via pruning and downsampling, and reproduce key brain topographic signatures in both visual and language cortices. This work advances efficient, interpretable AI that more closely mirrors human cortical computation, with potential impact on scalable deployment and neuroscientific modeling.

Abstract

Neurons in the brain are organized such that nearby cells tend to share similar functions. AI models lack this organization, and past efforts to introduce topography have often led to trade-offs between topography and task performance. In this work, we present TopoLoss, a new loss function that promotes spatially organized topographic representations in AI models without significantly sacrificing task performance. TopoLoss is highly adaptable and can be seamlessly integrated into the training of leading model architectures. We validate our method on both vision (ResNet-18, ResNet-50, ViT) and language models (GPT-Neo-125M, NanoGPT), collectively TopoNets. TopoNets are the highest-performing supervised topographic models to date, exhibiting brain-like properties such as localized feature processing, lower dimensionality, and increased efficiency. TopoNets also predict responses in the brain and replicate the key topographic signatures observed in the brain's visual and language cortices. Together, this work establishes a robust and generalizable framework for integrating topography into leading model architectures, advancing the development of high-performing models that more closely emulate the computational strategies of the human brain.

TopoNets: High Performing Vision and Language Models with Brain-Like Topography

TL;DR

TopoNets address the lack of brain-like topography in artificial networks by introducing TopoLoss, a brain-inspired inductive bias that reshapes weights into cortical sheets and enforces locality through a differentiable blurring objective. The approach generalizes across vision and language architectures (CNNs and transformers) and yields high task performance while producing brain-like, low-dimensional representations. Empirically, TopoNets outperform prior topo methods on ImageNet and BrainScore benchmarks, enable parameter-efficient representations via pruning and downsampling, and reproduce key brain topographic signatures in both visual and language cortices. This work advances efficient, interpretable AI that more closely mirrors human cortical computation, with potential impact on scalable deployment and neuroscientific modeling.

Abstract

Neurons in the brain are organized such that nearby cells tend to share similar functions. AI models lack this organization, and past efforts to introduce topography have often led to trade-offs between topography and task performance. In this work, we present TopoLoss, a new loss function that promotes spatially organized topographic representations in AI models without significantly sacrificing task performance. TopoLoss is highly adaptable and can be seamlessly integrated into the training of leading model architectures. We validate our method on both vision (ResNet-18, ResNet-50, ViT) and language models (GPT-Neo-125M, NanoGPT), collectively TopoNets. TopoNets are the highest-performing supervised topographic models to date, exhibiting brain-like properties such as localized feature processing, lower dimensionality, and increased efficiency. TopoNets also predict responses in the brain and replicate the key topographic signatures observed in the brain's visual and language cortices. Together, this work establishes a robust and generalizable framework for integrating topography into leading model architectures, advancing the development of high-performing models that more closely emulate the computational strategies of the human brain.

Paper Structure

This paper contains 11 sections, 6 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Towards high-performing topographic vision and language models (TopoNets). Schematic shows transformation from unstructured baseline models (left) to organized topographic representations (right) for vision (top) and language models (right). The stacked maps are 3 representative layers (early, mid and late) of the model.
  • Figure 2: TopoNets achieve higher model performance with comparable topography. A. Estimated model topography (smoothness, x-axis) versus model performance (y-axis) for vision models (ResNet-18, ResNet-50, ViTs). The black filled dots with the dashed gray crosshairs indicate prior models. The dashed black lines indicate the pareto-curves for ResNet-18 and ResNet-50 models. B. Same as A. but for Language models (GPT-Neo-125M and NanoGPT). The y-axis here denotes the language model evaluation score on BLiMP. The dashed gray line indicates the reported topography from a prior study.
  • Figure 3: Topography explains reductions in model dimensionality. A. (Left) Model performance (ImageNet accuracy, x-axis) versus effective dimensionality for vision ResNets. (Right) Measured topography (smoothness) versus effective dimensionality for vision ResNets. B. Same as A, but for language transformers
  • Figure 4: Measuring the efficiency of TopoNets against baseline models: A. Fraction of weights masked through L1 unstructured pruning (x-axis) versus the change in model performance (y-axis) for ResNet-18 (left), ResNet-50 (center), and GPT-Neo-125M (right) models. Colored circles represent TopoNets, while hollow black circles represent baseline models. The performance of the baseline models is shown by the gray line. B. Percentage of model weights after downsampling (x-axis) versus the drop in model performance (y-axis) for GPT-Neo-125M models.
  • Figure 5: TopoNets recapitulate topographic signatures observed in the visual and language cortex. A. Topographic signatures in vision TopoNets (ResNet-18). Colormaps show t-values corresponding to selectivities for faces, bodies, scenes, real-world size, and animacy. Bold connected lines indicate the same regions across different topographic maps. Maps for the same model layer for the untrained (U) and baseline (B) models are shown below for comparison. B. Topographic signatures in language TopoNets. Colormaps display the strength of estimated power-law (red), exponential (green), and sentence-yoked coefficients across layers (from left to right). These coefficients indicate fast, slow and sentence-yoked temporal integration windows