Table of Contents
Fetching ...

Computationally Efficient Replicable Learning of Parities

Moshe Noivirt, Jessica Sorrell, Eliad Tsfadia

TL;DR

The paper investigates the computational relationships between replicability and privacy in learning, and provides the first polynomial-time replicable algorithm for realizable learning of parity functions over arbitrary distributions. The main technical contribution is RepLinearSpan, a replicable subspace-identification subroutine that, given $m$ vectors in ${\mathbb F}_2^d$, outputs a subspace of their span capturing at least $1-\varepsilon$ of the vectors with $\rho$-replicability and running time $O(m^2 d^3)$. This subspace then enables a replicable parity learner by solving a linear system within the learned subspace, yielding a realizable $(\varepsilon,\delta)$-PAC learner for parity functions with sample complexity poly$(d,1/\rho,1/\varepsilon,\log(1/\delta))$. Collectively, the results show that efficient replicable learning over general distributions can extend beyond SQ-learning and approximate the power of differentially private learning, highlighting a closer computational alignment between replication and privacy than previously known.

Abstract

We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.

Computationally Efficient Replicable Learning of Parities

TL;DR

The paper investigates the computational relationships between replicability and privacy in learning, and provides the first polynomial-time replicable algorithm for realizable learning of parity functions over arbitrary distributions. The main technical contribution is RepLinearSpan, a replicable subspace-identification subroutine that, given vectors in , outputs a subspace of their span capturing at least of the vectors with -replicability and running time . This subspace then enables a replicable parity learner by solving a linear system within the learned subspace, yielding a realizable -PAC learner for parity functions with sample complexity poly. Collectively, the results show that efficient replicable learning over general distributions can extend beyond SQ-learning and approximate the power of differentially private learning, highlighting a closer computational alignment between replication and privacy than previously known.

Abstract

We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.
Paper Structure (22 sections, 11 theorems, 18 equations, 4 algorithms)

This paper contains 22 sections, 11 theorems, 18 equations, 4 algorithms.

Key Result

Theorem 1

There exists a polynomial-time $\rho$-replicable learning algorithm that (realizably) $(\varepsilon,\delta)$-PAC learns the class of parity functions over $\left\{0,1\right\}^d$ with sample complexity $poly(d,1/\rho,1/\varepsilon,\log(1/\delta))$.

Theorems & Definitions (31)

  • Theorem 1: Replicable Learning of Parities
  • Theorem 2: Replicable Linear Span
  • Corollary 1
  • Definition 2.1: (Realizable) PAC Learnability, see e.g., shalev2014understanding
  • Definition 2.2
  • Definition 2.3: Replicability impagliazzo2022reproducibility
  • Definition 2.4
  • Proposition 1
  • Theorem 3: McDiarmid's Inequality mcdiarmid1989method
  • Theorem 4: Stable Partition, Algorithm 1 in kaplan2025differentially
  • ...and 21 more