Table of Contents
Fetching ...

Covariates-Adjusted Mixed-Membership Estimation: A Novel Network Model with Optimal Guarantees

Jianqing Fan, Jiawei Ge, Jikai Hou

TL;DR

This work develops CAMM, a covariates-adjusted network model that merges the DCMM latent structure with node covariates in a logistic edge model. It introduces a constrained regularized MLE with a nuclear-norm penalty on the low-rank component and proves optimal estimation guarantees for $H$, $\Gamma$, and the mixed-membership $\Pi$, even when the low-rank structure is inferred via nonconvex optimization. A novel analytical framework leverages a nonconvex reformulation, debiasing, and a bridge argument to transfer guarantees to the convex estimator, enabling generalization beyond mean-squared error losses. Through simulations and a real S&P 500 dataset, the method demonstrates accurate parameter recovery, substantial explanatory power of covariates, and coherent sector-aligned membership recovery, highlighting practical applicability for covariate-rich network analysis.

Abstract

This paper addresses the problem of mixed-membership estimation in networks, where the goal is to efficiently estimate the latent mixed-membership structure from the observed network. Recognizing the widespread availability and valuable information carried by node covariates, we propose a novel network model that incorporates both community information, as represented by the Degree-Corrected Mixed Membership (DCMM) model, and node covariate similarities to determine connections. We investigate the regularized maximum likelihood estimation (MLE) for this model and demonstrate that our approach achieves optimal estimation accuracy for both the similarity matrix and the mixed-membership, in terms of both the Frobenius norm and the entrywise loss. Since directly analyzing the original convex optimization problem is intractable, we employ nonconvex optimization to facilitate the analysis. A key contribution of our work is identifying a crucial assumption that bridges the gap between convex and nonconvex solutions, enabling the transfer of statistical guarantees from the nonconvex approach to its convex counterpart. Importantly, our analysis extends beyond the MLE loss and the mean squared error (MSE) used in matrix completion problems, generalizing to all the convex loss functions. Consequently, our analysis techniques extend to a broader set of applications, including ranking problems based on pairwise comparisons. Finally, simulation experiments validate our theoretical findings, and real-world data analyses confirm the practical relevance of our model.

Covariates-Adjusted Mixed-Membership Estimation: A Novel Network Model with Optimal Guarantees

TL;DR

This work develops CAMM, a covariates-adjusted network model that merges the DCMM latent structure with node covariates in a logistic edge model. It introduces a constrained regularized MLE with a nuclear-norm penalty on the low-rank component and proves optimal estimation guarantees for , , and the mixed-membership , even when the low-rank structure is inferred via nonconvex optimization. A novel analytical framework leverages a nonconvex reformulation, debiasing, and a bridge argument to transfer guarantees to the convex estimator, enabling generalization beyond mean-squared error losses. Through simulations and a real S&P 500 dataset, the method demonstrates accurate parameter recovery, substantial explanatory power of covariates, and coherent sector-aligned membership recovery, highlighting practical applicability for covariate-rich network analysis.

Abstract

This paper addresses the problem of mixed-membership estimation in networks, where the goal is to efficiently estimate the latent mixed-membership structure from the observed network. Recognizing the widespread availability and valuable information carried by node covariates, we propose a novel network model that incorporates both community information, as represented by the Degree-Corrected Mixed Membership (DCMM) model, and node covariate similarities to determine connections. We investigate the regularized maximum likelihood estimation (MLE) for this model and demonstrate that our approach achieves optimal estimation accuracy for both the similarity matrix and the mixed-membership, in terms of both the Frobenius norm and the entrywise loss. Since directly analyzing the original convex optimization problem is intractable, we employ nonconvex optimization to facilitate the analysis. A key contribution of our work is identifying a crucial assumption that bridges the gap between convex and nonconvex solutions, enabling the transfer of statistical guarantees from the nonconvex approach to its convex counterpart. Importantly, our analysis extends beyond the MLE loss and the mean squared error (MSE) used in matrix completion problems, generalizing to all the convex loss functions. Consequently, our analysis techniques extend to a broader set of applications, including ranking problems based on pairwise comparisons. Finally, simulation experiments validate our theoretical findings, and real-world data analyses confirm the practical relevance of our model.

Paper Structure

This paper contains 44 sections, 35 theorems, 473 equations, 2 figures, 3 tables, 3 algorithms.

Key Result

Proposition 3.1

Assumption assumption:r+1 holds for the stochastic block model with two communities. More specifically, Assumption assumption:r+1 holds when $H^*=0$ and where $\mathbf{1}\in\mathbb{R}^{\frac{n}{2}\times 1}$ is an all one vector and $p>q$.

Figures (2)

  • Figure 1: Log–log plot of the estimation error of $\hat{H}, \hat{\Gamma}$ measured by $\|\cdot\|_{F}$ and $\|\cdot\|_{\infty}$ vs. the number of nodes $n$. The results are reported for $r=2, p=3, \lambda=\sqrt{n}$ and are averaged over $100$ independent trials.
  • Figure 2: Sector plot for $\hat{\Pi}$. Companies from financials/real estate/consumer staples/industrials sectors are marked in black in the top left/top right/bottom left/bottom right plots. The bottom right plot is rotated to better show the industrials sector.

Theorems & Definitions (80)

  • Proposition 3.1
  • Theorem 3.2
  • Remark 3.3
  • Proposition 3.4
  • Definition 3.5: Efficient Vertex Hunting
  • Remark 3.6: Example of an Efficient VH Algorithm: Successive Projection
  • Theorem 3.7
  • Remark 3.8
  • Proposition A.1
  • proof : Proof of Proposition \ref{['prop:eigenvalues']}
  • ...and 70 more