Adversarial Manipulation of Deep Representations
Sara Sabour, Yanshuai Cao, Fartash Faghri, David J. Fleet
TL;DR
The paper introduces feature adversaries, a class of adversarial images that, with imperceptible perturbations, force a DNN’s intermediate representations to resemble those of a chosen guide image while keeping the image perceptually similar to the source. It formalizes a constrained optimization to minimize representation distance at a target layer and validates the approach across multiple networks and datasets, showing that the adversarial encodings are often natural-looking and close to the guide in high-dimensional feature space. Through Euclidean and manifold-based analyses (PPCA tangent space and angular consistency) and comparisons with label-based adversaries, the authors demonstrate that the phenomenon reflects intrinsic properties of DNN representations rather than mere misclassification or linearity effects, and that architecture plays a substantial role, even in randomly weighted networks. These findings raise fundamental questions about the geometry of natural image manifolds in deep representations and have implications for model robustness and defense against sophisticated adversaries.
Abstract
We show that the representation of an image in a deep neural network (DNN) can be manipulated to mimic those of other natural images, with only minor, imperceptible perturbations to the original image. Previous methods for generating adversarial images focused on image perturbations designed to produce erroneous class labels, while we concentrate on the internal layers of DNN representations. In this way our new class of adversarial images differs qualitatively from others. While the adversary is perceptually similar to one image, its internal representation appears remarkably similar to a different image, one from a different class, bearing little if any apparent similarity to the input; they appear generic and consistent with the space of natural images. This phenomenon raises questions about DNN representations, as well as the properties of natural images themselves.
