Identification and Estimation of a Semiparametric Logit Model using Network Data
Brice Romuald Gueyap Kounga
TL;DR
The paper addresses identification in semiparametric binary choice models where social networks are endogenous. It develops a kernel-based, codegree-driven matching estimator that exploits network-type equivalence to differ out latent heterogeneity without imposing a parametric link-formation model, and proves consistency and asymptotic normality for the slope parameter $\beta$, with $\sqrt{n}(\hat{\beta}-\beta)\to_d\mathcal{N}(0,4\Sigma^{-1}V\Sigma^{-1})$. The estimator uses $\hat{\delta}_{ij}$, a codegree-based distance, to weight pairs in a kernelized conditional logit objective, and it provides consistent estimates of the social influence function $\lambda(\omega_i)$. Monte Carlo simulations show substantial bias reduction and good inference across diverse network designs, and an empirical application to microfinance diffusion in rural India demonstrates meaningful covariate effects when endogenous networks are accounted for. Overall, the approach offers a flexible, transparent way to handle network endogeneity in nonlinear models using observable network data.
Abstract
This paper studies identification and estimation in semiparametric binary choice models when social networks are endogenous. In many applications, unobserved individual traits shape both the outcome of interest and the formation of social ties, so standard logit specifications, including those augmented with common network controls, can be biased. I show how network data can be used to address this endogeneity without imposing parametric structure on the link formation process. The key insight is that agents who are observationally equivalent in their network formation behavior share the same latent social influence, even if the underlying individual traits remain unobserved. Exploiting this equivalence, I establish point identification of the slope parameters in a binary response model by comparing matched pairs of agents with identical network types. I propose feasible estimators based on nonparametric matching using codegree information derived from the adjacency matrix and establish their consistency and asymptotic normality. Monte Carlo simulations demonstrate that the proposed estimator performs well in finite samples across a range of network designs. An empirical application to microfinance adoption in rural Indian villages illustrates how the method can be implemented in a canonical network dataset and shows that accounting for endogenous network formation affects estimated covariate effects, both with and without village fixed effects.
