Linear Regression: Inference Based on Cluster Estimates
Subhodeep Dey, Gopal K. Basak, Samarjit Das
TL;DR
The paper addresses inference for regression with clustered data by introducing a cluster-averaged estimator that accounts for within-cluster dependence and remains consistent across varying cluster sizes. It extends to a random coefficients framework, deriving a central limit theorem for the average parameter and a Wald-type test for general linear hypotheses, as well as a novel superblock-based test for parameter constancy across higher-level blocks. Compared with POLS, the proposed method maintains consistency and often improved efficiency, particularly when large clusters dominate the sample. An empirical application to India’s HCES data demonstrates significant cross-state heterogeneity, underscoring the practical relevance of robust cluster-aware inference for cross-sectional analyses.
Abstract
This article proposes a novel estimator for regression coefficients in clustered data that explicitly accounts for within-cluster dependence. We study the asymptotic properties of the proposed estimator under both finite and infinite cluster sizes. The analysis is then extended to a standard random coefficient model, where we derive asymptotic results for the average (common) parameters and develop a Wald-type test for general linear hypotheses. We also investigate the performance of the conventional pooled ordinary least squares (POLS) estimator within the random coefficients framework and show that it can be unreliable across a wide range of empirically relevant settings. Furthermore, we introduce a new test for parameter stability at a higher (superblock; Tier 2, Tier 3,...) level, assuming that parameters are stable across clusters within that level. Extensive simulation studies demonstrate the effectiveness of the proposed tests, and an empirical application illustrates their practical relevance.
