Stretching Beyond the Obvious: A Gradient-Free Framework to Unveil the Hidden Landscape of Visual Invariance
Lorenzo Tausani, Paolo Muratore, Morgan B. Talbot, Giacomo Amerio, Gabriel Kreiman, Davide Zoccolan
TL;DR
The paper tackles how visual units encode invariances beyond traditional excited-image visualizations by introducing Stretch-and-Squeeze (SnS), a gradient-free, model-agnostic framework that uses a generative model and CMA-ES to solve a bi-objective optimization in a latent space of dimension $n=4096$, seeking invariant images and adversarial perturbations. SnS probes invariance across multiple processing stages ($\kappa$) and target levels ($\ell$) by maximizing representation dissimilarity (stretch) while preserving downstream activation (squeeze), or vice versa for adversarial samples, with Pareto-front selection guiding the search. Applied to ResNet50, SnS reveals layer-specific invariant manifolds: pixel-space stretching mainly affects luminance/contrast, mid-level stretching alters texture/color, and high-level stretching shifts pose semantics, with invariances and dimensionality showing nonlinear, hierarchical structure. Comparisons between $L_2$-robust and standard networks show robust invariances are more human- and observer-recognizable at all levels but become less interpretable in deep layers, whereas standard networks show the opposite trend, highlighting how robustness shapes perceptual alignment and invariance. SnS offers a powerful, gradient-free tool for neuroscience and AI, capable of probing black-box systems and guiding the design of more human-aligned representations, while enabling analyses of invariance manifold geometry and cross-architecture transferability. The authors provide code, data, and detailed supplementary materials to support replication and future methodological extensions.
Abstract
Uncovering which feature combinations are encoded by visual units is critical to understanding how images are transformed into representations that support recognition. While existing feature visualization approaches typically infer a unit's most exciting images, this is insufficient to reveal the manifold of transformations under which responses remain invariant, which is critical to generalization in vision. Here we introduce Stretch-and-Squeeze (SnS), a model-agnostic, gradient-free framework to systematically characterize a unit's maximally invariant stimuli, and its vulnerability to adversarial perturbations, in both biological and artificial visual systems. SnS frames these transformations as bi-objective optimization problems. To probe invariance, SnS seeks image perturbations that maximally alter (stretch) the representation of a reference stimulus in a given processing stage while preserving unit activation downstream (squeeze). To probe adversarial sensitivity, stretching and squeezing are reversed to maximally perturb unit activation while minimizing changes to the upstream representation. Applied to CNNs, SnS revealed invariant transformations that were farther from a reference image in pixel-space than those produced by affine transformations, while more strongly preserving the target unit's response. The discovered invariant images differed depending on the stage of the image representation used for optimization: pixel-level changes primarily affected luminance and contrast, while stretching mid- and late-layer representations mainly altered texture and pose. By measuring how well the hierarchical invariant images obtained for L2 robust networks were classified by humans and other observer networks, we discovered a substantial drop in their interpretability when the representation was stretched in deep layers, while the opposite trend was found for standard models.
