Manipulating Predictions over Discrete Inputs in Machine Teaching

Xiaodong Wu; Yufei Han; Hayssam Dahrouj; Jianbing Ni; Zhenwen Liang; Xiangliang Zhang

Manipulating Predictions over Discrete Inputs in Machine Teaching

Xiaodong Wu, Yufei Han, Hayssam Dahrouj, Jianbing Ni, Zhenwen Liang, Xiangliang Zhang

TL;DR

The paper addresses manipulating predictions in discrete-input settings by formulating discrete machine teaching as a combinatorial optimization problem and introducing Discrete Machine Teaching (DMT), an iterative framework that grows a minimal teacher dataset. DMT selects influential base samples using $k$-nearest neighbors with the Jaccard distance, then constructs perturbations with Gradient Guided Greedy Method (GGGM) guided by gradient-based scores $g_{dist}$ and $g_{align}$, all within a perturbation budget $\epsilon$. Through iterative updates and incremental data insertion, DMT can both improve erroneous predictions and tamper predictions, outperforming adapted continuous-domain baselines across three discrete datasets (MALWARE, IPS, EHR) in terms of efficiency and change success rate. The work demonstrates high manipulation effectiveness without testing data access, highlighting the potential security risks of discrete-domain learning systems and suggesting future work on guarantees and defenses.

Abstract

Machine teaching often involves the creation of an optimal (typically minimal) dataset to help a model (referred to as the `student') achieve specific goals given by a teacher. While abundant in the continuous domain, the studies on the effectiveness of machine teaching in the discrete domain are relatively limited. This paper focuses on machine teaching in the discrete domain, specifically on manipulating student models' predictions based on the goals of teachers via changing the training data efficiently. We formulate this task as a combinatorial optimization problem and solve it by proposing an iterative searching algorithm. Our algorithm demonstrates significant numerical merit in the scenarios where a teacher attempts at correcting erroneous predictions to improve the student's models, or maliciously manipulating the model to misclassify some specific samples to the target class aligned with his personal profits. Experimental results show that our proposed algorithm can have superior performance in effectively and efficiently manipulating the predictions of the model, surpassing conventional baselines.

Manipulating Predictions over Discrete Inputs in Machine Teaching

TL;DR

-nearest neighbors with the Jaccard distance, then constructs perturbations with Gradient Guided Greedy Method (GGGM) guided by gradient-based scores

and

, all within a perturbation budget

. Through iterative updates and incremental data insertion, DMT can both improve erroneous predictions and tamper predictions, outperforming adapted continuous-domain baselines across three discrete datasets (MALWARE, IPS, EHR) in terms of efficiency and change success rate. The work demonstrates high manipulation effectiveness without testing data access, highlighting the potential security risks of discrete-domain learning systems and suggesting future work on guarantees and defenses.

Abstract

Paper Structure (16 sections, 4 equations, 2 figures, 2 tables, 2 algorithms)

This paper contains 16 sections, 4 equations, 2 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Machine Teaching
Prediction Manipulation
Methodology
Problem Formulation
Teacher's Capability
DMT Method
Experiments
Dataset
Implementation Details
Evaluation Tasks and Metrics
Baseline Methods
Prediction Improvement Results
Prediction Tampering Results
...and 1 more sections

Figures (2)

Figure 1: The CSR of Different Methods on the Performance Improvement Task When Varying the Percentages of Samples Allowed to Change.
Figure 2: The CSR of Different Methods in Prediction Tampering Tasks with Varying Sample Percentage in MALWARE, EHR, and IPS.

Manipulating Predictions over Discrete Inputs in Machine Teaching

TL;DR

Abstract

Manipulating Predictions over Discrete Inputs in Machine Teaching

Authors

TL;DR

Abstract

Table of Contents

Figures (2)