The Space Complexity of Approximating Logistic Loss
Gregory Dexter, Petros Drineas, Rajiv Khanna
TL;DR
The paper addresses the space efficiency of data structures that approximate the logistic loss in logistic regression, introducing the dataset-dependent measure $μ_{oldsymbol{y}}(\mathbf{X})$ to capture compressibility. It proves strong lower bounds: a $\tilde{Ω}\left(\frac{d}{ε^2}\right)$ space requirement when $μ_{oldsymbol{y}}(\mathbf{X})=Θ(1)$ and a general $\tilde{Ω}\left(d\cdot μ_{oldsymbol{y}}(\mathbf{X})\right)$ bound for constant $ε$, demonstrating that the dependence on $μ_{oldsymbol{y}}(\mathbf{X})$ is intrinsic rather than an artifact of specific coreset constructions. The work also provides a polynomial-time linear-programming approach to compute $μ_{oldsymbol{y}}(\mathbf{X})$, refutes prior conjectures about its hardness, and includes empirical comparisons to prior methods. Overall, the results imply that existing coreset bounds are near-optimal in the typical $μ$-bounded regime while revealing fundamental limits to compressing logistic regression data beyond these bounds.
Abstract
We provide space complexity lower bounds for data structures that approximate logistic loss up to $ε$-relative error on a logistic regression problem with data $\mathbf{X} \in \mathbb{R}^{n \times d}$ and labels $\mathbf{y} \in \{-1,1\}^d$. The space complexity of existing coreset constructions depend on a natural complexity measure $μ_\mathbf{y}(\mathbf{X})$, first defined in (Munteanu, 2018). We give an $\tildeΩ(\frac{d}{ε^2})$ space complexity lower bound in the regime $μ_\mathbf{y}(\mathbf{X}) = O(1)$ that shows existing coresets are optimal in this regime up to lower order factors. We also prove a general $\tildeΩ(d\cdot μ_\mathbf{y}(\mathbf{X}))$ space lower bound when $ε$ is constant, showing that the dependency on $μ_\mathbf{y}(\mathbf{X})$ is not an artifact of mergeable coresets. Finally, we refute a prior conjecture that $μ_\mathbf{y}(\mathbf{X})$ is hard to compute by providing an efficient linear programming formulation, and we empirically compare our algorithm to prior approximate methods.
