Trustworthy Machine Learning under Social and Adversarial Data Sources

Han Shao

Trustworthy Machine Learning under Social and Adversarial Data Sources

Han Shao

TL;DR

This work develops a theoretical foundation for trustworthy machine learning when data come from social and adversarial sources. It provides a spectrum of results across strategic classification, federated incentives, active learning, multi-objective decision making, and robust learning under clean-label attacks, linking online and PAC learnability to manipulation power and information structure. Core contributions include logarithmic vs linear bounds for strategic learning under ball manipulations, the design of incentive-aware and stable-policy mechanisms in federated and collaborative contexts, and efficient algorithms for unknown-manipulation-graph and multi-objective learning with comparative feedback. The findings illuminate when learnability transfers under strategic behavior, propose practical algorithms (e.g., Strategic Halving, ADA-GD, stable-policy oracles), and establish fundamental limits and open problems for trustworthy ML in societally impactful data settings.

Abstract

Machine learning has witnessed remarkable breakthroughs in recent years. As machine learning permeates various aspects of daily life, individuals and organizations increasingly interact with these systems, exhibiting a wide range of social and adversarial behaviors. These behaviors may have a notable impact on the behavior and performance of machine learning systems. Specifically, during these interactions, data may be generated by strategic individuals, collected by self-interested data collectors, possibly poisoned by adversarial attackers, and used to create predictors, models, and policies satisfying multiple objectives. As a result, the machine learning systems' outputs might degrade, such as the susceptibility of deep neural networks to adversarial examples (Shafahi et al., 2018; Szegedy et al., 2013) and the diminished performance of classic algorithms in the presence of strategic individuals (Ahmadi et al., 2021). Addressing these challenges is imperative for the success of machine learning in societal settings.

Trustworthy Machine Learning under Social and Adversarial Data Sources

TL;DR

Abstract

Paper Structure (381 sections, 209 theorems, 633 equations, 24 figures, 10 tables, 32 algorithms)

This paper contains 381 sections, 209 theorems, 633 equations, 24 figures, 10 tables, 32 algorithms.

Introduction
Strategic Individuals
Self-Interested Data Collectors
Multi-Objective Users
Adversarial Attackers
Overview of Thesis Contributions and Structure
Chapters \ref{['chap:strategic-finite']} and \ref{['chap:strategic-infinite']}: Learning with Strategic Individuals.
Chapters \ref{['chap:incentives-single']}, \ref{['chap:incentives-multi']} and \ref{['chap:incentives-active']}: Incentives in Collaborative Learning
Chapter \ref{['chap:games']}: Learning within Games
Chapter \ref{['chap:momdp']} and \ref{['chap:primary']}: Multi-Objective Learning
Chapter \ref{['chap:clean-label']}: Robust Learning under Clean-Label Attack
Chapter \ref{['chap:transformation']}: Learning under Transformation Invariances and Data Augmentation
Bibliographical Remarks
Strategic Classification for Finite Hypothesis Class
Introduction
...and 366 more sections

Key Result

Theorem 2.1

For any feature-ball manipulation set space $\mathcal{Q}$ and hypothesis class $\mathcal{H}$, Strategic Halving achieves mistake bound $\mathrm{MB}_{x,\Delta} \leq \log(\left|\mathcal{H}\right|)$.

Figures (24)

Figure 1: Each line here represents the average of 100 non-federated runs of a distribution used in this experiment. Note that the less difficult distributions reach the threshold quickly, whereas the more difficult distributions take nearly three times as long.
Figure 2: Comparing the likelihood that a single defector will reach their accuracy threshold at various contributions for federated averaging and MW-FED after 10 epochs. The result shows that MW-FED results in allocations that are closer to an equilibrium compared to FedAvg.
Figure 3: Objectives of the server v/s the agents.
Figure 4: Impact of defections on both average and population accuracy metrics when using federated averaging with local update steps $K=5$ and step size $\eta = 0.1$. The CIFAR10 dataset krizhevsky2009learning is processed to achieve a heterogeneity level of $q = 0.9$ (refer to appendix \ref{['sec:exp']} for more details). Agents utilizing a two-layer fully connected neural network with a softmax activation function to achieve a precision/loss threshold of $\varepsilon = 0.2$. Dashed lines mark the iterations when an agent defects. It is evident that each defection adversely affects the model's accuracy. For example, the peak average accuracy drops from approximately 46% prior to any defections to around 22% after 500 iterations. A similar decline is observed in population accuracy.
Figure 5: In the above figure, we plot the minimum, maximum, and mean device accuracy out of the $M$ devices for the final FedAvg model as a function of the required precision $\varepsilon$ (on the $x$-axis in the plot). As $\varepsilon$ increases, the likelihood of each device defecting increases, so all the curves almost always decrease. The task is multi-class classification on CIFAR-10, and we simulate data heterogeneity by over-representing different classes on different agents (see Appendix \ref{['sec:exp']}). All experiments report accuracy averaged across 10 runs, along with error bars for $95\%$ confidence level.
...and 19 more figures

Theorems & Definitions (271)

Definition 2.1
Definition 2.2
Theorem 2.1
Theorem 2.2
Example 2.1
Theorem 2.3
Theorem 2.4
Theorem 2.5
Theorem 2.6
Lemma 2.1
...and 261 more

Trustworthy Machine Learning under Social and Adversarial Data Sources

TL;DR

Abstract

Trustworthy Machine Learning under Social and Adversarial Data Sources

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (271)