Table of Contents
Fetching ...

Worst-case low-rank approximations

Anya Fries, Markus Reichstein, David Blei, Jonas Peters

Abstract

Real-world data in health, economics, and environmental sciences are often collected across heterogeneous domains (such as hospitals, regions, or time periods). In such settings, distributional shifts can make standard PCA unreliable, in that, for example, the leading principal components may explain substantially less variance in unseen domains than in the training domains. Existing approaches (such as FairPCA) have proposed to consider worst-case (rather than average) performance across multiple domains. This work develops a unified framework, called wcPCA, applies it to other objectives (resulting in the novel estimators such as norm-minPCA and norm-maxregret, which are better suited for applications with heterogeneous total variance) and analyzes their relationship. We prove that for all objectives, the estimators are worst-case optimal not only over the observed source domains but also over all target domains whose covariance lies in the convex hull of the (possibly normalized) source covariances. We establish consistency and asymptotic worst-case guarantees of empirical estimators. We extend our methodology to matrix completion, another problem that makes use of low-rank approximations, and prove approximate worst-case optimality for inductive matrix completion. Simulations and two real-world applications on ecosystem-atmosphere fluxes demonstrate marked improvements in worst-case performance, with only minor losses in average performance.

Worst-case low-rank approximations

Abstract

Real-world data in health, economics, and environmental sciences are often collected across heterogeneous domains (such as hospitals, regions, or time periods). In such settings, distributional shifts can make standard PCA unreliable, in that, for example, the leading principal components may explain substantially less variance in unseen domains than in the training domains. Existing approaches (such as FairPCA) have proposed to consider worst-case (rather than average) performance across multiple domains. This work develops a unified framework, called wcPCA, applies it to other objectives (resulting in the novel estimators such as norm-minPCA and norm-maxregret, which are better suited for applications with heterogeneous total variance) and analyzes their relationship. We prove that for all objectives, the estimators are worst-case optimal not only over the observed source domains but also over all target domains whose covariance lies in the convex hull of the (possibly normalized) source covariances. We establish consistency and asymptotic worst-case guarantees of empirical estimators. We extend our methodology to matrix completion, another problem that makes use of low-rank approximations, and prove approximate worst-case optimality for inductive matrix completion. Simulations and two real-world applications on ecosystem-atmosphere fluxes demonstrate marked improvements in worst-case performance, with only minor losses in average performance.
Paper Structure (84 sections, 11 theorems, 93 equations, 15 figures, 2 tables)

This paper contains 84 sections, 11 theorems, 93 equations, 15 figures, 2 tables.

Key Result

Proposition 4

In general, the subspace obtained by sequentially selecting directions to optimize the worst-case explained variance (resp. worst-case proportion of explained variance) does not coincide with the solution of the joint $\mathtt{minPCA}$ (resp. $\mathtt{norm}\textrm{-}\mathtt{minPCA}$) problem.

Figures (15)

  • Figure 1: Proportion of explained variance by $\mathtt{poolPCA}$ and $\mathtt{norm}\textrm{-}\mathtt{maxRegret}$ in the FLUXNET example on source (left) and target (right) regions (for one specific split). Unlike $\mathtt{poolPCA}$, which optimizes a pooled explained variance, $\mathtt{norm}\textrm{-}\mathtt{maxRegret}$ optimizes a worst-case criterion over the source domains, which, here, yields a worst-case improvement of 7.8%. Section \ref{['sec:robustness']} proves that this choice comes with a worst-case improvement over a larger class of target domains, too. Indeed, in this example, the worst-case explained variance over the target domains improves by 25.8%. (Over 20 different splits, the median relative increase in worst-case improvements equal 25.6% and 16.6% for source and target domains, respectively; the median decrease in average performance for source domains equals 7.5%; see Section \ref{['sec:appl:fluxnet']} for more details.)
  • Figure 2: Visualization of possible solutions to the rank-1 approximation problem in a specific example. The figure shows contour plots of two distributions $\mathcal{N}(0, \Sigma_1)$ (green) and $\mathcal{N}(0, \Sigma_2)$ (orange), together with their supports (shaded planes). The methods $\mathtt{poolPCA}$, $\mathtt{sepPCA}$, $\mathtt{minPCA}$, and $\mathtt{maxRegret}$ describe different ways to aggregate over different source domains in the objective function (see Section \ref{['sec:variants']} for details); their solutions are denoted by $V^{\textrm{pool}}$, $V^{\textrm{sep}}$, $V^{\textrm{minPCA}}$, and $V^\textrm{maxRegret}$, respectively (the values in parentheses indicate the pooled and worst-case explained variance). For example, $V^{\textrm{pool}}$ and $V^{\textrm{sep}}$ explain $0\%$ of variance in the worst-case domain, as they are orthogonal to the support of that domain. In contrast, $V^{\textrm{minPCA}}$ maximizes the worst case explained variance (resulting in 36%) and $V^\textrm{maxRegret}$ explains at least 24% of variance in each domain. Details are provided in Example \ref{['example1']}.
  • Figure 3: Illustration of Theorem \ref{['thm:maxrcs-convex-hull']}(i); see Section \ref{['sec:sims-illustrate']}. Reconstruction errors for the source domains and 50 target domains are shown for $\mathtt{poolPCA}$ and $\mathtt{maxRCS}$ in a population setting. The blue line marks $m^*_\mathrm{maxRCS}$, the maximum reconstruction error over the source domains. As expected from Theorem \ref{['thm:maxrcs-convex-hull']}(i), all $\mathtt{maxRCS}$ errors lie below this bound, whereas $\mathtt{poolPCA}$ exceeds it in several cases.
  • Figure 4: Average vs. worst-case reconstruction error of $\mathtt{maxRCS}$ and $\mathtt{poolPCA}$; see Section \ref{['sec:sims-avg-vs-wc']}. The difference in errors are shown relative to $\mathtt{poolPCA}$'s average performance. Values below zero indicate that $\mathtt{maxRCS}$ outperforms $\mathtt{poolPCA}$. $\mathtt{maxRCS}$ improves worst-case performance, while incurring only small losses on average.
  • Figure 5: Finite-sample performance of $\mathtt{maxRCS}$; see Section \ref{['sec:sims-finite-sample']}. Left: Difference in worst-case loss (reconstruction error) over the convex hull of the population covariance matrices between empirical and population $\mathtt{maxRCS}$. Right: Analogous difference between between empirical $\mathtt{maxRCS}$ and empirical $\mathtt{poolPCA}$. Negative values indicate that $\mathtt{maxRCS}$ attains lower worst-case error than $\mathtt{poolPCA}$. Across sample sizes, $\mathtt{maxRCS}$ outperforms $\mathtt{poolPCA}$.
  • ...and 10 more figures

Theorems & Definitions (19)

  • Example 1
  • Definition 1: minPCA
  • Definition 2: maxRCS
  • Definition 3: maxRegret
  • Proposition 4: Sequential vs. joint optimization, informal
  • Theorem 5: Relations between wcPCA variants
  • Theorem 6: Robustness of wcPCA
  • Theorem 7: Robustness of wcPCA, normalized
  • Remark 8
  • Proposition 9: Consistency of the estimators
  • ...and 9 more