Inference in semiparametric formation models for directed networks

Lianqiang Qu; Lu Chen; Ting Yan; Yuguo Chen

Inference in semiparametric formation models for directed networks

Lianqiang Qu, Lu Chen, Ting Yan, Yuguo Chen

TL;DR

The paper develops a semiparametric directed-network formation model with node-specific out-degree and in-degree effects $\alpha_i$ and $\beta_j$, plus a covariate-driven homophily term $X_{ij}^\top\gamma$, and latent noise $\varepsilon_{ij}$. It introduces a projection-based estimation scheme that uses a kernel-based density estimator for a special regressor and yields an unbiased homophily estimator via projection, followed by a constrained least squares estimation of the degree parameters. The authors establish consistency and high-dimensional central limit theorems for the estimators, enabling valid inference and tests for sparse signals, support recovery, and degree heterogeneity, with explicit variance formulas involving $V=U^\top U$ and $\sigma_\epsilon^2$. They demonstrate finite-sample performance through simulations and a real data application (Lazega partners) showing improved fit and meaningful homophily and heterogeneity insights, and provide extensions to conditionally independent noises and weighted networks. The framework offers robust, scalable semi-parametric inference for directed networks, mitigating model misspecification risk while enabling precise hypothesis testing and variable selection in high dimensions.

Abstract

We propose a semiparametric model for dyadic link formations in directed networks. The model contains a set of degree parameters that measure different effects of popularity or outgoingness across nodes, a regression parameter vector that reflects the homophily effect resulting from the nodal attributes or pairwise covariates associated with edges, and a set of latent random noises with unknown distributions. Our interest lies in inferring the unknown degree parameters and homophily parameters. The dimension of the degree parameters increases with the number of nodes. Under the high-dimensional regime, we develop a kernel-based least squares approach to estimate the unknown parameters. The major advantage of our estimator is that it does not encounter the incidental parameter problem for the homophily parameters. We prove consistency of all the resulting estimators of the degree parameters and homophily parameters. We establish high-dimensional central limit theorems for the proposed estimators and provide several applications of our general theory, including testing the existence of degree heterogeneity, testing sparse signals and recovering the support. Simulation studies and a real data application are conducted to illustrate the finite sample performance of the proposed methods.

Inference in semiparametric formation models for directed networks

TL;DR

The paper develops a semiparametric directed-network formation model with node-specific out-degree and in-degree effects

and

, plus a covariate-driven homophily term

, and latent noise

. It introduces a projection-based estimation scheme that uses a kernel-based density estimator for a special regressor and yields an unbiased homophily estimator via projection, followed by a constrained least squares estimation of the degree parameters. The authors establish consistency and high-dimensional central limit theorems for the estimators, enabling valid inference and tests for sparse signals, support recovery, and degree heterogeneity, with explicit variance formulas involving

and

. They demonstrate finite-sample performance through simulations and a real data application (Lazega partners) showing improved fit and meaningful homophily and heterogeneity insights, and provide extensions to conditionally independent noises and weighted networks. The framework offers robust, scalable semi-parametric inference for directed networks, mitigating model misspecification risk while enabling precise hypothesis testing and variable selection in high dimensions.

Abstract

Paper Structure (37 sections, 18 theorems, 143 equations, 9 figures, 10 tables)

This paper contains 37 sections, 18 theorems, 143 equations, 9 figures, 10 tables.

Introduction
Semiparametric network formation models
Identification and estimation
Identification of parameters
Estimation methods
Theoretical Results
Applications
Testing for sparse signals
Support recovery
Testing for degree heterogeneity
Numerical studies
Evaluating asymptotic properties
Testing for sparse signal
Testing for degree heterogeneity
Real data analysis
...and 22 more sections

Key Result

Theorem 1

If Conditions (C1)-(C3) hold, then we have

Figures (9)

Figure 1: Projection onto the linear subspace spanned by the column vectors of $U$. Here, $\widetilde{Z}_j$ denotes the $j$th column vector of $Z$.
Figure 2: Empirical size and power of $\mathcal{T}_{\alpha,S}$ and $\mathcal{T}_{\beta,S}.$
Figure 3: Empirical size and power of $\mathcal{T}_{\alpha,D}(\widetilde{M})$ with $\widetilde{M}=0,1,2$ and $3.$
Figure : Figure S1: Empirical size and power of $\mathcal{T}_{\alpha,S}$ and $\mathcal{T}_{\beta,S}$ for conditionally independent cases.
Figure : Figure S2: Empirical size and power of $\mathcal{T}_{\alpha,D}$ and $\mathcal{T}_{\beta,D}$ for conditionally independent cases.
...and 4 more figures

Theorems & Definitions (42)

Theorem 1
Corollary 1
Remark 1
Remark 2
Remark 3
Theorem 2
Theorem 3
Theorem 4
Remark 4
Theorem 5
...and 32 more

Inference in semiparametric formation models for directed networks

TL;DR

Abstract

Inference in semiparametric formation models for directed networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (42)