Table of Contents
Fetching ...

Inverting Neural Networks: New Methods to Generate Neural Network Inputs from Prescribed Outputs

Rebecca Pattichis, Sebastian Janampa, Constantinos S. Pattichis, Marios S. Pattichis

Abstract

Neural network systems describe complex mappings that can be very difficult to understand. In this paper, we study the inverse problem of determining the input images that get mapped to specific neural network classes. Ultimately, we expect that these images contain recognizable features that are associated with their corresponding class classifications. We introduce two general methods for solving the inverse problem. In our forward pass method, we develop an inverse method based on a root-finding algorithm and the Jacobian with respect to the input image. In our backward pass method, we iteratively invert each layer, at the top. During the inversion process, we add random vectors sampled from the null-space of each linear layer. We demonstrate our new methods on both transformer architectures and sequential networks based on linear layers. Unlike previous methods, we show that our new methods are able to produce random-like input images that yield near perfect classification scores in all cases, revealing vulnerabilities in the underlying networks. Hence, we conclude that the proposed methods provide a more comprehensive coverage of the input image spaces that solve the inverse mapping problem.

Inverting Neural Networks: New Methods to Generate Neural Network Inputs from Prescribed Outputs

Abstract

Neural network systems describe complex mappings that can be very difficult to understand. In this paper, we study the inverse problem of determining the input images that get mapped to specific neural network classes. Ultimately, we expect that these images contain recognizable features that are associated with their corresponding class classifications. We introduce two general methods for solving the inverse problem. In our forward pass method, we develop an inverse method based on a root-finding algorithm and the Jacobian with respect to the input image. In our backward pass method, we iteratively invert each layer, at the top. During the inversion process, we add random vectors sampled from the null-space of each linear layer. We demonstrate our new methods on both transformer architectures and sequential networks based on linear layers. Unlike previous methods, we show that our new methods are able to produce random-like input images that yield near perfect classification scores in all cases, revealing vulnerabilities in the underlying networks. Hence, we conclude that the proposed methods provide a more comprehensive coverage of the input image spaces that solve the inverse mapping problem.
Paper Structure (11 sections, 4 equations, 3 figures, 1 table)

This paper contains 11 sections, 4 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Neural Network inversion using forward pass.
  • Figure 2: Neural Network inversion using backward pass.
  • Figure 3: Generated ideal input images using different algorithms. train-data row: reference images given in the MNIST training data for each class. pattichis2024understanding-1layer row: ideal images generated in pattichis2024understanding. 1layerNN, 2layerNN, and 6layer NN rows: ideal images generated BackPassInv algorithm on FCNNs, with null-space noise added ($\texttt{Std}=0.1$). ViT-tiny and DINOV3 rows: generated images using ForwardPassInv(.). All generated images from new methods produce a probability score $\geq0.9$ for each digit class and $\leq$0.1 for the other ones.