Understanding Cross-Model Perceptual Invariances Through Ensemble Metamers

Lukas Boehm; Jonas Leo Mueller; Christoffer Loeffler; Leo Schwinn; Bjoern Eskofier; Dario Zanca

Understanding Cross-Model Perceptual Invariances Through Ensemble Metamers

Lukas Boehm, Jonas Leo Mueller, Christoffer Loeffler, Leo Schwinn, Bjoern Eskofier, Dario Zanca

TL;DR

This work investigates how architectural differences shape perceptual invariances in artificial vision by generating metamers through ensemble-based optimization across CNNs and vision transformers. It introduces a multi-model metamer generation framework that optimizes activations across an ensemble using projected gradient descent and an inversion loss to produce metamers that are both natural-looking and cross-model recognizable. Evaluations across diverse model sets and image-quality metrics reveal that CNNs yield more recognizable and human-like metamers, while transformers produce metamers that look natural but transfer less across models, highlighting the impact of architectural biases on representational invariances. The findings underscore the value of ensemble approaches for improving cross-model consistency and offer insights for aligning machine-perceived visuals with human perception, with implications for interpretability and robustness across architectures.

Abstract

Understanding the perceptual invariances of artificial neural networks is essential for improving explainability and aligning models with human vision. Metamers - stimuli that are physically distinct yet produce identical neural activations - serve as a valuable tool for investigating these invariances. We introduce a novel approach to metamer generation by leveraging ensembles of artificial neural networks, capturing shared representational subspaces across diverse architectures, including convolutional neural networks and vision transformers. To characterize the properties of the generated metamers, we employ a suite of image-based metrics that assess factors such as semantic fidelity and naturalness. Our findings show that convolutional neural networks generate more recognizable and human-like metamers, while vision transformers produce realistic but less transferable metamers, highlighting the impact of architectural biases on representational invariances.

Understanding Cross-Model Perceptual Invariances Through Ensemble Metamers

TL;DR

Abstract

Understanding Cross-Model Perceptual Invariances Through Ensemble Metamers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)