Table of Contents
Fetching ...

Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond

Alkis Kalavasis, Anay Mehrotra, Felix Zhou

TL;DR

The main result is a $\operatorname{poly}(d,k,1/\varepsilon) + {k}^{O(k)}$ time algorithm for this problem, which yields an improvement in the running time of the algorithms of [CDIZ23] and [GM24, arXiv].

Abstract

We revisit the problem of estimating $k$ linear regressors with self-selection bias in $d$ dimensions with the maximum selection criterion, as introduced by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [CDIZ23, STOC'23]. Our main result is a $\operatorname{poly}(d,k,1/\varepsilon) + {k}^{O(k)}$ time algorithm for this problem, which yields an improvement in the running time of the algorithms of [CDIZ23] and [GM24, arXiv]. We achieve this by providing the first local convergence algorithm for self-selection, thus resolving the main open question of [CDIZ23]. To obtain this algorithm, we reduce self-selection to a seemingly unrelated statistical problem called coarsening. Coarsening occurs when one does not observe the exact value of the sample but only some set (a subset of the sample space) that contains the exact value. Inference from coarse samples arises in various real-world applications due to rounding by humans and algorithms, limited precision of instruments, and lag in multi-agent systems. Our reduction to coarsening is intuitive and relies on the geometry of the self-selection problem, which enables us to bypass the limitations of previous analytic approaches. To demonstrate its applicability, we provide a local convergence algorithm for linear regression under another self-selection criterion, which is related to second-price auction data. Further, we give the first polynomial time local convergence algorithm for coarse Gaussian mean estimation given samples generated from a convex partition. Previously, only a sample-efficient algorithm was known due to Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21, COLT'21].

Can SGD Select Good Fishermen? Local Convergence under Self-Selection Biases and Beyond

TL;DR

The main result is a time algorithm for this problem, which yields an improvement in the running time of the algorithms of [CDIZ23] and [GM24, arXiv].

Abstract

We revisit the problem of estimating linear regressors with self-selection bias in dimensions with the maximum selection criterion, as introduced by Cherapanamjeri, Daskalakis, Ilyas, and Zampetakis [CDIZ23, STOC'23]. Our main result is a time algorithm for this problem, which yields an improvement in the running time of the algorithms of [CDIZ23] and [GM24, arXiv]. We achieve this by providing the first local convergence algorithm for self-selection, thus resolving the main open question of [CDIZ23]. To obtain this algorithm, we reduce self-selection to a seemingly unrelated statistical problem called coarsening. Coarsening occurs when one does not observe the exact value of the sample but only some set (a subset of the sample space) that contains the exact value. Inference from coarse samples arises in various real-world applications due to rounding by humans and algorithms, limited precision of instruments, and lag in multi-agent systems. Our reduction to coarsening is intuitive and relies on the geometry of the self-selection problem, which enables us to bypass the limitations of previous analytic approaches. To demonstrate its applicability, we provide a local convergence algorithm for linear regression under another self-selection criterion, which is related to second-price auction data. Further, we give the first polynomial time local convergence algorithm for coarse Gaussian mean estimation given samples generated from a convex partition. Previously, only a sample-efficient algorithm was known due to Fotakis, Kalavasis, Kontonis, and Tzamos [FKKT21, COLT'21].

Paper Structure

This paper contains 123 sections, 45 theorems, 356 equations, 10 figures, 1 algorithm.

Key Result

Lemma 1.1

Consider a partition $\mathdutchcal{P}$ of $\mathbb{R}^d$ that is $\alpha$-information preserving at radius $R$ with respect to $\euscr{N}{\left(\mu^\star,{\Sigma^\star}\right)}$. Then, for $(\mu, \Sigma)$ that are $R$-close to $(\mu^\star, {\Sigma^\star})$ in $\ell_2$-norm.

Figures (10)

  • Figure 1: The figure illustrates different partitions $\mathdutchcal{P}$ of $\mathbb{R}^2$. The figure on the left corresponds to a partition that is not identifiable: Given any $\mu^\star \in \mathbb{R}^2$, the vector $\mu_t = \mu^\star + t e_1$ (for any $t\in \mathbb{R}$) induces the same coarse Gaussian distribution and so $\euscr{N}_{\mathdutchcal{P}}\left(\mu^\star,I\right)$ and $\euscr{N}_{\mathdutchcal{P}}\left(\mu^\star_t,I\right)$ are identical. The middle and right figures are identifiable and correspond to convex partitions of the space.
  • Figure 2: This figure illustrates a function $f$ that is not convex but does satisfy a quadratic local growth condition. Indeed, $f$ is lower bounded by a quadratic function (shown by the dotted line).
  • Figure 3: The left figure is an approximate illustration of the partition over $\mathbb{R}^2$ that the self-selection mechanism (\ref{['def:ssb']}) induces over the dependent variable space for $k=2$. Each set $P_{y_{\max}}$ corresponds to some green $L$-shape set. The true partition covers the entire space with the $2$-dimensional $L$-shapes. The right figure is an example of an observation from the self-selection model in the dependent variable. See \ref{['fig:self-selection-3D']} for an illustration of the partitions with $k=3$.
  • Figure 4: The left figure is an illustration of moving $W^\star$ to $W \in \mathbb{R}^{d \times k}$. In general, it is unclear how the assigned mass on the set $P_{y_{\max}}$ changes. The depicted direction of change is along an 'easy' direction where the new point $W^\top x \in \mathbb{R}^{k \times 1}$ assigns less mass than $( W^\star)^\top x.$ This is because any point $p$ on the $L$-shape, is further from $(W^\star)^\top x$ than from $W^\top x$. The right figure gives an example where the $L$-shape behaves like a convex set since the random variable $x^\top W^\star$ is "deep inside" the $L$-shape, conditioned on the good event $\mathscr{E}$ and this effectively enables us to ignore all but one part of the $L$-shape.
  • Figure 5: This figure illustrates one set $P$ in the partition $\mathdutchcal{P}$ arising in the self-selection problem with $k=3$. The figure illustrates the set $P$ corresponding to the observation $y_{\max}=1$. $P$ is defined as the set of points $(y_1,y_2,y_3)$ with $\max\left\{y_1,y_2,y_3\right\}=1$ and $y_1,y_2,y_3\leq 1$ In other words, $P = \{y_1= 1, y_2\leq 1, y_3\leq 1\}\cup \{y_1\leq 1, y_2= 1, y_3\leq 1\}\cup \{y_1\leq 1, y_2\leq 1, y_3= 1\}$. See \ref{['fig:self-selection-partitions']} for an illustration of the entire partition $\mathdutchcal{P}$ with $k=2$.
  • ...and 5 more figures

Theorems & Definitions (88)

  • Definition 1: Max Self-Selection cherapanamjeri2023selfselection
  • Definition 2: Information Preservation
  • Lemma 1.1: Information Preservation Implies Quadratic Growth
  • Definition 3
  • Definition 4: Coarse Mean Estimation fotakis2021coarse
  • Remark 1.2: $k$-th Price Auctions and Machine Autopsy
  • Definition 5: Information Preserving Distortion Mechanism
  • Lemma 3.1
  • proof
  • Theorem 3.2
  • ...and 78 more