Network Inversion of Binarised Neural Nets

Pirzada Suhail; Supratik Chakraborty; Amit Sethi

Network Inversion of Binarised Neural Nets

Pirzada Suhail, Supratik Chakraborty, Amit Sethi

TL;DR

This work tackles interpretability and input-space trust in binarised neural networks by encoding a trained BNN into a CNF formula $BNN(X,H,Y)$ that captures the full computation across input, hidden, and output layers. Inference and inversion become SAT problems through additional constraints $I(X=x)$ or $O(Y=y)$, with CMSGen providing diverse, near-uniform samples from the constrained solution space. Experiments on MNIST-like architectures show that inverted inputs can be classified as a given label even when they do not resemble training data, highlighting potential safety concerns, while some label inversions are unsatisfiable, underscoring model limitations. The authors propose iterative retraining with a dedicated 'garbage' class to progressively sculpt the model's input space and improve safety-critical reliability of BNN deployments, leveraging the exactitude and controllability of CNF-based inversion for robust model refinement.

Abstract

While the deployment of neural networks, yielding impressive results, becomes more prevalent in various applications, their interpretability and understanding remain a critical challenge. Network inversion, a technique that aims to reconstruct the input space from the model's learned internal representations, plays a pivotal role in unraveling the black-box nature of input to output mappings in neural networks. In safety-critical scenarios, where model outputs may influence pivotal decisions, the integrity of the corresponding input space is paramount, necessitating the elimination of any extraneous "garbage" to ensure the trustworthiness of the network. Binarised Neural Networks (BNNs), characterized by binary weights and activations, offer computational efficiency and reduced memory requirements, making them suitable for resource-constrained environments. This paper introduces a novel approach to invert a trained BNN by encoding it into a CNF formula that captures the network's structure, allowing for both inference and inversion.

Network Inversion of Binarised Neural Nets

TL;DR

This work tackles interpretability and input-space trust in binarised neural networks by encoding a trained BNN into a CNF formula

that captures the full computation across input, hidden, and output layers. Inference and inversion become SAT problems through additional constraints

, with CMSGen providing diverse, near-uniform samples from the constrained solution space. Experiments on MNIST-like architectures show that inverted inputs can be classified as a given label even when they do not resemble training data, highlighting potential safety concerns, while some label inversions are unsatisfiable, underscoring model limitations. The authors propose iterative retraining with a dedicated 'garbage' class to progressively sculpt the model's input space and improve safety-critical reliability of BNN deployments, leveraging the exactitude and controllability of CNF-based inversion for robust model refinement.

Abstract

Paper Structure (5 sections, 15 equations, 2 figures)

This paper contains 5 sections, 15 equations, 2 figures.

Introduction
Methodology & Implementation
Experiments & Results
Conclusion & Future Work
Finer Implementation Details

Figures (2)

Figure 1: Training Images
Figure 2: Inverted Images for Class 2

Network Inversion of Binarised Neural Nets

TL;DR

Abstract

Network Inversion of Binarised Neural Nets

Authors

TL;DR

Abstract

Table of Contents

Figures (2)