Table of Contents
Fetching ...

Entry-Specific Matrix Estimation under Arbitrary Sampling Patterns through the Lens of Network Flows

Yudong Chen, Xumei Xi, Christina Lee Yu

TL;DR

The paper develops a graph-based, network-flow framework for entry-specific matrix estimation under arbitrary sampling, enabling precise, entrywise guarantees that depend on the observation graph's connectivity. For additive matrices, the electrical-flow estimator aligns with least squares and yields error bounds proportional to the effective resistance $R(u_i,v_j)$, with UMVUE properties under Gaussian noise and identifiability tied to graph connectivity. For rank-1 matrices, a path-based, ratio-constructed estimator achieves minimax rates when the pattern is sufficiently dense, with bounds governed by the number and length of disjoint paths and minimum cuts. The approach provides practical tools for panel data causal inference (via TWFE and DiD connections) and offers a fine-grained understanding of how sampling patterns shape estimation difficulty, supported by synthetic simulations. Overall, the work introduces a new family of estimators parametrized by network flows that quantify the intrinsic complexity of matrix completion at the entry level.

Abstract

Matrix completion tackles the task of predicting missing values in a low-rank matrix based on a sparse set of observed entries. It is often assumed that the observation pattern is generated uniformly at random or has a very specific structure tuned to a given algorithm. There is still a gap in our understanding when it comes to arbitrary sampling patterns. Given an arbitrary sampling pattern, we introduce a matrix completion algorithm based on network flows in the bipartite graph induced by the observation pattern. For additive matrices, the particular flow we used is the electrical flow and we establish error upper bounds customized to each entry as a function of the observation set, along with matching minimax lower bounds. Our results show that the minimax squared error for recovery of a particular entry in the matrix is proportional to the effective resistance of the corresponding edge in the graph. Furthermore, we show that our estimator is equivalent to the least squares estimator. We apply our estimator to the two-way fixed effects model and show that it enables us to accurately infer individual causal effects and the unit-specific and time-specific confounders. For rank-$1$ matrices, we use edge-disjoint paths to form an estimator that achieves minimax optimal estimation when the sampling is sufficiently dense. Our discovery introduces a new family of estimators parametrized by network flows, which provide a fine-grained and intuitive understanding of the impact of the given sampling pattern on the relative difficulty of estimation at an entry-specific level. This graph-based approach allows us to quantify the inherent complexity of matrix completion for individual entries, rather than relying solely on global measures of performance.

Entry-Specific Matrix Estimation under Arbitrary Sampling Patterns through the Lens of Network Flows

TL;DR

The paper develops a graph-based, network-flow framework for entry-specific matrix estimation under arbitrary sampling, enabling precise, entrywise guarantees that depend on the observation graph's connectivity. For additive matrices, the electrical-flow estimator aligns with least squares and yields error bounds proportional to the effective resistance , with UMVUE properties under Gaussian noise and identifiability tied to graph connectivity. For rank-1 matrices, a path-based, ratio-constructed estimator achieves minimax rates when the pattern is sufficiently dense, with bounds governed by the number and length of disjoint paths and minimum cuts. The approach provides practical tools for panel data causal inference (via TWFE and DiD connections) and offers a fine-grained understanding of how sampling patterns shape estimation difficulty, supported by synthetic simulations. Overall, the work introduces a new family of estimators parametrized by network flows that quantify the intrinsic complexity of matrix completion at the entry level.

Abstract

Matrix completion tackles the task of predicting missing values in a low-rank matrix based on a sparse set of observed entries. It is often assumed that the observation pattern is generated uniformly at random or has a very specific structure tuned to a given algorithm. There is still a gap in our understanding when it comes to arbitrary sampling patterns. Given an arbitrary sampling pattern, we introduce a matrix completion algorithm based on network flows in the bipartite graph induced by the observation pattern. For additive matrices, the particular flow we used is the electrical flow and we establish error upper bounds customized to each entry as a function of the observation set, along with matching minimax lower bounds. Our results show that the minimax squared error for recovery of a particular entry in the matrix is proportional to the effective resistance of the corresponding edge in the graph. Furthermore, we show that our estimator is equivalent to the least squares estimator. We apply our estimator to the two-way fixed effects model and show that it enables us to accurately infer individual causal effects and the unit-specific and time-specific confounders. For rank- matrices, we use edge-disjoint paths to form an estimator that achieves minimax optimal estimation when the sampling is sufficiently dense. Our discovery introduces a new family of estimators parametrized by network flows, which provide a fine-grained and intuitive understanding of the impact of the given sampling pattern on the relative difficulty of estimation at an entry-specific level. This graph-based approach allows us to quantify the inherent complexity of matrix completion for individual entries, rather than relying solely on global measures of performance.
Paper Structure (40 sections, 15 theorems, 79 equations, 9 figures, 3 algorithms)

This paper contains 40 sections, 15 theorems, 79 equations, 9 figures, 3 algorithms.

Key Result

Lemma 1

For $(\hat{a}, \hat{b}) \in \mathbb{R}^{n_v}$ satisfying eq:lse_factor_defn with minimum Euclidean norm, we have where $\operatorname{diag}(\cdot)$ denotes the vector formed by the diagonal of a matrix.

Figures (9)

  • Figure 1: We depict two bipartite graphs constructed from two given observation patterns. To estimate the entry $(1,1)$, we use data along the paths connecting $u_1$ and $v_1$ in the graph. In (a), there is a path of length $\ell=5$ from $u_{1}$ to $v_{1}$. We construct an estimate by alternatingly adding and subtracting observations along the path, where blue edges indicate addition and red edges indicate subtraction. Due to the alternating signs, the latent factors corresponding to the intermediate vertices cancel out, resulting in an expected estimate of $a_1^\ast + b_1^\ast$. In (b), we have three short paths of length $3$ connecting $u_1$ and $v_1$. The orange path (path $1$) corresponds to observations in entries $(1,2), (2,2)$ and $(2,1)$; the green path (path $2$) corresponds to observations in entries $(1,3), (3,3)$ and $(3,1)$; the magenta path (path $3$) corresponds to observations in entries $(1,4), (4,4)$ and $(4,1)$.
  • Figure 2: Examples depicting electrical flow over graphs that contain overlapping paths and paths of varying lengths. (a) Electrical flow puts higher weight on the edge that overlaps multiple paths. (b) Electrical flow puts higher weight on shorter paths relative to longer paths.
  • Figure 3: Electrical network constructed from a bipartite graph.
  • Figure 4: Equivalence of DiD estimators to flow estimators constrained to length 3 paths.
  • Figure 5: Treatment pattern corresponding to a staggered exposure model, where units are partitioned into $G$ groups and each group is exposed to the treatment for a fixed length of time beginning at staggered times. The red arrows show an example path from row $1$ to column $T$.
  • ...and 4 more figures

Theorems & Definitions (30)

  • Definition 1: Additivity
  • Definition 2
  • Definition 3: Rank-1
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • ...and 20 more