Table of Contents
Fetching ...

A Sparse Beta Regression Model for Network Analysis

Stefan Stein, Rui Feng, Chenlei Leng

Abstract

For statistical analysis of network data, the $β$-model has emerged as a useful tool, thanks to its flexibility in incorporating nodewise heterogeneity and theoretical tractability. To generalize the $β$-model, this paper proposes the Sparse $β$-Regression Model (S$β$RM) that unites two research themes developed recently in modelling homophily and sparsity. In particular, we employ differential heterogeneity that assigns weights only to important nodes and propose penalized likelihood with an $\ell_1$ penalty for parameter estimation. While our estimation method is closely related to the LASSO method for logistic regression, we develop new theory emphasizing the use of our model for dealing with a parameter regime that can handle sparse networks usually seen in practice. More interestingly, the resulting inference on the homophily parameter demands no debiasing normally employed in LASSO type estimation. We provide extensive simulation and data analysis to illustrate the use of the model. As a special case of our model, we extend the Erdős-Rényi model by including covariates and develop the associated statistical inference for sparse networks, which may be of independent interest.

A Sparse Beta Regression Model for Network Analysis

Abstract

For statistical analysis of network data, the -model has emerged as a useful tool, thanks to its flexibility in incorporating nodewise heterogeneity and theoretical tractability. To generalize the -model, this paper proposes the Sparse -Regression Model (SRM) that unites two research themes developed recently in modelling homophily and sparsity. In particular, we employ differential heterogeneity that assigns weights only to important nodes and propose penalized likelihood with an penalty for parameter estimation. While our estimation method is closely related to the LASSO method for logistic regression, we develop new theory emphasizing the use of our model for dealing with a parameter regime that can handle sparse networks usually seen in practice. More interestingly, the resulting inference on the homophily parameter demands no debiasing normally employed in LASSO type estimation. We provide extensive simulation and data analysis to illustrate the use of the model. As a special case of our model, we extend the Erdős-Rényi model by including covariates and develop the associated statistical inference for sparse networks, which may be of independent interest.

Paper Structure

This paper contains 41 sections, 27 theorems, 434 equations, 8 figures, 6 tables.

Key Result

Lemma 1

Assume that $0 < d_+ < \binom{n}{2}$. Then, for any $0 < \lambda < \infty$ there exists a minimizer for the optimization problem (Eq: Penalized llhd with covariates) and any solution $\hat{\theta} = (\hat{\beta}^T, \hat{\mu}, \hat{\gamma}^T)^T$ of (Eq: Penalized llhd with covariates) must satisfy $\

Figures (8)

  • Figure 1: Errors for estimating the true parameter $\theta_0$ in Model 1 across various network sizes and 1000 repetitions. Comparison between model selection via BIC and a heuristic approach. The results when model selection is done with BIC are displayed in red (left boxes), those for the pre-determined $\lambda$ in green (right boxes). The $y$-axis uses a log scale.
  • Figure 2: Errors for estimating the true parameter $\theta_0$ in Model 2.
  • Figure 3: Errors for estimating the true parameter $\theta_0$ in Model 3.
  • Figure 4: Lazega's friendship network among 71 lawyers. The size of the nodes is proportional to their degree. For better visibility we set the size of all nodes with a degree of five or lower to the size corresponding to a degree of five. In \ref{['Fig: Lawyer office']} the different colors indicate different offices (blue: Boston, yellow: Hartford, black: Providence; notice that only four lawyers are based in the small Providence office) and in \ref{['Fig: Lawyer status']} different statuses (red: partner, green: associate). The positions of the vertices are the same in both plots.
  • Figure 5: Visualization of the estimated $\beta$ values in the world trade network in 1990 between 136 countries/regions. The color of the country/region corresponds to the magnitude of the estimated $\beta$. Countries in grey either have an estimated $\beta$ value of zero or were not present in the data set
  • ...and 3 more figures

Theorems & Definitions (51)

  • Lemma 1
  • Proposition 1
  • Theorem 1
  • Proposition 2
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Corollary 1
  • proof : Proof of Lemma \ref{['Lem: Existence of solution and identifiability']}
  • ...and 41 more