Table of Contents
Fetching ...

The Role of Learning Algorithms in Collective Action

Omri Ben-Dov, Jake Fawkes, Samira Samadi, Amartya Sanyal

TL;DR

The paper addresses how learning algorithms influence the success of collective action in ML by moving beyond Bayes-optimal analyses to algorithm-dependent phenomena. It introduces a formal framework for planting signals within mixture distributions and analyzes two main algorithm families: Distributionally Robust Optimisation (DRO) and gradient-descent-based methods with simplicity bias (e.g., SGD). The authors prove and validate that the effective collective size $\alpha_{\text{eff}}$ and the overall success $S(\alpha)$ depend critically on algorithmic properties, showing that DRO can amplify impact for small collectives while iterative re-weighting and validation strategies can dramatically alter outcomes. They also demonstrate how algorithmic biases can be exploited to design more effective signals, with substantial empirical evidence from synthetic data, CIFAR-10, Waterbirds, and MNIST-CIFAR-inspired setups. Overall, the work highlights the necessity of considering learning algorithms when evaluating and designing collective action in ML systems, and outlines future directions for broader algorithmic classes and safety considerations.

Abstract

Collective action in machine learning is the study of the control that a coordinated group can have over machine learning algorithms. While previous research has concentrated on assessing the impact of collectives against Bayes (sub-)optimal classifiers, this perspective is limited in that it does not account for the choice of learning algorithm. Since classifiers seldom behave like Bayes classifiers and are influenced by the choice of learning algorithms along with their inherent biases, in this work we initiate the study of how the choice of the learning algorithm plays a role in the success of a collective in practical settings. Specifically, we focus on distributionally robust optimization (DRO), popular for improving a worst group error, and on the ubiquitous stochastic gradient descent (SGD), due to its inductive bias for "simpler" functions. Our empirical results, supported by a theoretical foundation, show that the effective size and success of the collective are highly dependent on properties of the learning algorithm. This highlights the necessity of taking the learning algorithm into account when studying the impact of collective action in machine learning.

The Role of Learning Algorithms in Collective Action

TL;DR

The paper addresses how learning algorithms influence the success of collective action in ML by moving beyond Bayes-optimal analyses to algorithm-dependent phenomena. It introduces a formal framework for planting signals within mixture distributions and analyzes two main algorithm families: Distributionally Robust Optimisation (DRO) and gradient-descent-based methods with simplicity bias (e.g., SGD). The authors prove and validate that the effective collective size and the overall success depend critically on algorithmic properties, showing that DRO can amplify impact for small collectives while iterative re-weighting and validation strategies can dramatically alter outcomes. They also demonstrate how algorithmic biases can be exploited to design more effective signals, with substantial empirical evidence from synthetic data, CIFAR-10, Waterbirds, and MNIST-CIFAR-inspired setups. Overall, the work highlights the necessity of considering learning algorithms when evaluating and designing collective action in ML systems, and outlines future directions for broader algorithmic classes and safety considerations.

Abstract

Collective action in machine learning is the study of the control that a coordinated group can have over machine learning algorithms. While previous research has concentrated on assessing the impact of collectives against Bayes (sub-)optimal classifiers, this perspective is limited in that it does not account for the choice of learning algorithm. Since classifiers seldom behave like Bayes classifiers and are influenced by the choice of learning algorithms along with their inherent biases, in this work we initiate the study of how the choice of the learning algorithm plays a role in the success of a collective in practical settings. Specifically, we focus on distributionally robust optimization (DRO), popular for improving a worst group error, and on the ubiquitous stochastic gradient descent (SGD), due to its inductive bias for "simpler" functions. Our empirical results, supported by a theoretical foundation, show that the effective size and success of the collective are highly dependent on properties of the learning algorithm. This highlights the necessity of taking the learning algorithm into account when studying the impact of collective action in machine learning.
Paper Structure (30 sections, 13 theorems, 60 equations, 8 figures, 3 algorithms)

This paper contains 30 sections, 13 theorems, 60 equations, 8 figures, 3 algorithms.

Key Result

Theorem 1

Given the mixture distribution ${\mathcal{P}}_{\alpha}$ and the feature-label strategy for planting a signal, the success is lower bounded by where $\alpha$, $\xi$, $\Delta$, and $\epsilon$ are, respectively, the size, uniqueness, sub-optimality gap for $y^{*}$ in the base distribution, and sub-optimality of the learned classifier on ${\mathcal{P}}_{\alpha}$.

Figures (8)

  • Figure 1: Success with DRO algorithms. (a) An example of the 2D dataset. The color of each point represents its label, and the grey rectangle is the co-domain of the collective signal. (b-c) The success of a collective of different sizes $\alpha$ when trained with ERM (blue circles), JTT (orange squares) and LfF(green triangles) on a synthetic 2D and CIFAR-10.
  • Figure 2: Image transformation used by the collective. The effect of the signal is exaggerated for visualisation purposes and in practice it is invisible to the human eye.
  • Figure 3: Success sensitivity to stopping condition when using CVaR-DRO on the Waterbirds dataset with a collective of size $\alpha=0.3$. (a) The success of the collective after every epoch of training. Each shade represents a differently initialised training. The rapid and sharp oscillations show how drastic it is to stop training at the right time. (b) The success achieved by different $\alpha$ for 2 different stopping conditions. The baseline (ERM) is shown in blue circles. Stopping at maximum accuracy in a validation set with no collective action is shown in the green triangles. Stopping at maximum success on the validation set is shown in orange triangles. When the firm is trying to maximize general accuracy, the collective has no success.
  • Figure 4: Each graph shows the success of different levels on collective action in the validation set $\alpha_\text{val}$ when using ERM or CVaR-DRO on CIFAR-10 and Waterbirds. The blue circles are for a training set with $\alpha_\text{train}=0.001$ and the orange squares are for $\alpha_\text{train}=0.1$. ERM is almost not affected from $\alpha_\text{val}$, but for CVaR-DRO $\alpha_\text{val}$ is crucial.
  • Figure 5: Results on LMS-$k$. (a) an example of LMS-$6$. The color of each point represents its label, the grey rectangle is the codomain of the collective signal. (b) Success over LMS-$k$. Larger complexity ($k$) increases the success.
  • ...and 3 more figures

Theorems & Definitions (26)

  • Theorem 1: Theorem 1 in hardt_algorithmic_2023
  • Definition 1
  • Proposition 3.0
  • Proposition 3.0
  • Proposition 3.0
  • Theorem 2
  • Remark 4.1
  • Definition 2
  • Definition 3
  • Definition 4
  • ...and 16 more