Table of Contents
Fetching ...

AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space

Huzheng Yang, James Gee, Jianbo Shi

TL;DR

AlignedCut introduces a brain-guided universal channel alignment that maps layer-wise deep-net features from multiple models into a shared space using brain voxel fMRI as supervision. By coupling a linear channel transform with brain prediction, the method discovers recurring visual concepts via spectral clustering, revealing figure-ground representations before category semantics and enabling cross-model concept tracing without a decoder. The approach shows concept emergence across layers, consistent brain ROI mappings, and interpretable layer-to-layer dynamics, aided by a scalable Nystrom-like spectral approximation and eigen-constraints. This yields a principled framework to quantify how visual information flows through networks and how shared concepts organize across models, with potential impact on model interpretability and cross-domain alignment.

Abstract

We study the intriguing connection between visual data, deep networks, and the brain. Our method creates a universal channel alignment by using brain voxel fMRI response prediction as the training objective. We discover that deep networks, trained with different objectives, share common feature channels across various models. These channels can be clustered into recurring sets, corresponding to distinct brain regions, indicating the formation of visual concepts. Tracing the clusters of channel responses onto the images, we see semantically meaningful object segments emerge, even without any supervised decoder. Furthermore, the universal feature alignment and the clustering of channels produce a picture and quantification of how visual information is processed through the different network layers, which produces precise comparisons between the networks.

AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space

TL;DR

AlignedCut introduces a brain-guided universal channel alignment that maps layer-wise deep-net features from multiple models into a shared space using brain voxel fMRI as supervision. By coupling a linear channel transform with brain prediction, the method discovers recurring visual concepts via spectral clustering, revealing figure-ground representations before category semantics and enabling cross-model concept tracing without a decoder. The approach shows concept emergence across layers, consistent brain ROI mappings, and interpretable layer-to-layer dynamics, aided by a scalable Nystrom-like spectral approximation and eigen-constraints. This yields a principled framework to quantify how visual information flows through networks and how shared concepts organize across models, with potential impact on model interpretability and cross-domain alignment.

Abstract

We study the intriguing connection between visual data, deep networks, and the brain. Our method creates a universal channel alignment by using brain voxel fMRI response prediction as the training objective. We discover that deep networks, trained with different objectives, share common feature channels across various models. These channels can be clustered into recurring sets, corresponding to distinct brain regions, indicating the formation of visual concepts. Tracing the clusters of channel responses onto the images, we see semantically meaningful object segments emerge, even without any supervised decoder. Furthermore, the universal feature alignment and the clustering of channels produce a picture and quantification of how visual information is processed through the different network layers, which produces precise comparisons between the networks.

Paper Structure

This paper contains 32 sections, 10 equations, 22 figures, 2 tables.

Figures (22)

  • Figure 1: Transform the hidden channel activation of deep-nets into visual brain voxels' response.
  • Figure 2: From the 768D feature on CLIP layer-6, we extract different levels of segmentation by restricting the use of a subset of channels. Left: Channel activation on example image patches. The ordering of channels is sorted from the early brain to the late brain by their weights for brain voxels. Right: Spectral clustering on each subset of channels filtered by each brain ROI (V1, V4, EBA), image pixels colored by 3D spectral-tSNE of top 10 eigenvectors.
  • Figure 3: Cosine similarity of channel activation on the same image inputs.
  • Figure 4: Spectral clustering in the universal channel aligned feature space. The image pixels are colored by our approach AlignedCut, the pixel RGB value is assigned by the 3D spectral-tSNE of the top 20 eigenvectors. The coloring is consistent across all images, layers, and models.
  • Figure 5: Unsupervised segmentation scores from spectral clustering on each CLIP layer. ImageNet-segmentation dataset is used with binary figure-ground labels, and the mIoU score peaks plateau from layer-4 to layer-10. In PASCAL VOC with 20 class labels, the mIoU score peaks at layer-9.
  • ...and 17 more figures