Computationally Efficient Replicable Learning of Parities
Moshe Noivirt, Jessica Sorrell, Eliad Tsfadia
TL;DR
The paper investigates the computational relationships between replicability and privacy in learning, and provides the first polynomial-time replicable algorithm for realizable learning of parity functions over arbitrary distributions. The main technical contribution is RepLinearSpan, a replicable subspace-identification subroutine that, given $m$ vectors in ${\mathbb F}_2^d$, outputs a subspace of their span capturing at least $1-\varepsilon$ of the vectors with $\rho$-replicability and running time $O(m^2 d^3)$. This subspace then enables a replicable parity learner by solving a linear system within the learned subspace, yielding a realizable $(\varepsilon,\delta)$-PAC learner for parity functions with sample complexity poly$(d,1/\rho,1/\varepsilon,\log(1/\delta))$. Collectively, the results show that efficient replicable learning over general distributions can extend beyond SQ-learning and approximate the power of differentially private learning, highlighting a closer computational alignment between replication and privacy than previously known.
Abstract
We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PAC learning and its connections to differential privacy (Dwork et al. [TCC 2006]) and to the statistical query (SQ) model (Kearns [JACM `98]). Statistically, it was known that differentially private learning and replicable learning are equivalent and strictly more powerful than SQ-learning. Yet, computationally, all previously known efficient (i.e., polynomial-time) replicable learning algorithms were confined to SQ-learnable tasks or restricted distributions, in contrast to differentially private learning. Our main contribution is the first computationally efficient replicable algorithm for realizable learning of parities over arbitrary distributions, a task that is known to be hard in the SQ-model, but possible under differential privacy. This result provides the first evidence that efficient replicable learning over general distributions strictly extends efficient SQ-learning, and is closer in power to efficient differentially private learning, despite computational separations between replicability and privacy. Our main building block is a new, efficient, and replicable algorithm that, given a set of vectors, outputs a subspace of their linear span that covers most of them.
