Table of Contents
Fetching ...

Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries

Alexander Beiser, Flavio Martinelli, Wulfram Gerstner, Johanni Brea

TL;DR

This work addresses reverse-engineering neural network weights from input-output queries in a teacher-student setting, showing that standard querying (e.g., using only the teacher's training data) fails when the teacher has many parameters. It introduces two data-augmentation strategies—biased-noise augmentation and Grid-Composition—that are tailored to elicit informative activations in the teacher's hidden layers, enabling weight recovery beyond previous limits. Using an Expand-and-Cluster pipeline with overparameterized students, the authors demonstrate weight recovery for teachers with up to $512$ hidden neurons and up to about $100\times$ more parameters than training samples, by expanding the effective dataset size and promoting variability along teacher weight directions. The findings highlight the importance of targeting pre-activation variability for identifiability and have implications for the security and interpretability of neural models, as well as for future scaling to higher-dimensional architectures.

Abstract

Network weights can be reverse-engineered given enough informative samples of a network's input-output function. In a teacher-student setup, this translates into collecting a dataset of the teacher mapping -- querying the teacher -- and fitting a student to imitate such mapping. A sensible choice of queries is the dataset the teacher is trained on. But current methods fail when the teacher parameters are more numerous than the training data, because the student overfits to the queries instead of aligning its parameters to the teacher. In this work, we explore augmentation techniques to best sample the input-output mapping of a teacher network, with the goal of eliciting a rich set of representations from the teacher hidden layers. We discover that standard augmentations such as rotation, flipping, and adding noise, bring little to no improvement to the identification problem. We design new data augmentation techniques tailored to better sample the representational space of the network's hidden layers. With our augmentations we extend the state-of-the-art range of recoverable network sizes. To test their scalability, we show that we can recover networks of up to 100 times more parameters than training data-points.

Data Augmentation Techniques to Reverse-Engineer Neural Network Weights from Input-Output Queries

TL;DR

This work addresses reverse-engineering neural network weights from input-output queries in a teacher-student setting, showing that standard querying (e.g., using only the teacher's training data) fails when the teacher has many parameters. It introduces two data-augmentation strategies—biased-noise augmentation and Grid-Composition—that are tailored to elicit informative activations in the teacher's hidden layers, enabling weight recovery beyond previous limits. Using an Expand-and-Cluster pipeline with overparameterized students, the authors demonstrate weight recovery for teachers with up to hidden neurons and up to about more parameters than training samples, by expanding the effective dataset size and promoting variability along teacher weight directions. The findings highlight the importance of targeting pre-activation variability for identifiability and have implications for the security and interpretability of neural models, as well as for future scaling to higher-dimensional architectures.

Abstract

Network weights can be reverse-engineered given enough informative samples of a network's input-output function. In a teacher-student setup, this translates into collecting a dataset of the teacher mapping -- querying the teacher -- and fitting a student to imitate such mapping. A sensible choice of queries is the dataset the teacher is trained on. But current methods fail when the teacher parameters are more numerous than the training data, because the student overfits to the queries instead of aligning its parameters to the teacher. In this work, we explore augmentation techniques to best sample the input-output mapping of a teacher network, with the goal of eliciting a rich set of representations from the teacher hidden layers. We discover that standard augmentations such as rotation, flipping, and adding noise, bring little to no improvement to the identification problem. We design new data augmentation techniques tailored to better sample the representational space of the network's hidden layers. With our augmentations we extend the state-of-the-art range of recoverable network sizes. To test their scalability, we show that we can recover networks of up to 100 times more parameters than training data-points.

Paper Structure

This paper contains 13 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Results of augmentation techniques on reconstruction. $<\!d(w_i, w_i^*)\!>$ and $\max_i d(w_i, w_i^*)$ denote the average and maximum cosine distance between teacher and reconstructed student neurons, respectively. $Q$ is the dataset size.
  • Figure 2: Pre-activation variability increases for different augmentation techniques, compared to the variability of MNIST. Error bars indicate the standard error of the mean (left). Average cosine distances for teacher sizes $2^2$ to $2^9$ trained on $5k$ MNIST data-points (right). Every student is only trained on queries from the same $5k$ MNIST subset plus augmentations.
  • Figure 3: (A): Data-points projected along teacher weight vector: low variability ensures good fit (red points) while too high variability samples the asymptotic part of the activation function leading to a bad fit (green). (B): ReLU example where variability alone is not enough. Data variability for both teacher vectors (red), but a single student vector fits the data perfectly (green).
  • Figure 4: Visualization of the Grid-Comp. data augmentation technique.
  • Figure 5: (log)-loss plots for augmentation techniques of the teacher with 512 hidden neurons, which was trained on the MNIST dataset (a), (b) and (c). (d): Reconstruction of data augmentation techniques on the 5k MNIST fragment.
  • ...and 3 more figures