Linear Regression: Inference Based on Cluster Estimates

Subhodeep Dey; Gopal K. Basak; Samarjit Das

Linear Regression: Inference Based on Cluster Estimates

Subhodeep Dey, Gopal K. Basak, Samarjit Das

TL;DR

The paper addresses inference for regression with clustered data by introducing a cluster-averaged estimator that accounts for within-cluster dependence and remains consistent across varying cluster sizes. It extends to a random coefficients framework, deriving a central limit theorem for the average parameter and a Wald-type test for general linear hypotheses, as well as a novel superblock-based test for parameter constancy across higher-level blocks. Compared with POLS, the proposed method maintains consistency and often improved efficiency, particularly when large clusters dominate the sample. An empirical application to India’s HCES data demonstrates significant cross-state heterogeneity, underscoring the practical relevance of robust cluster-aware inference for cross-sectional analyses.

Abstract

This article proposes a novel estimator for regression coefficients in clustered data that explicitly accounts for within-cluster dependence. We study the asymptotic properties of the proposed estimator under both finite and infinite cluster sizes. The analysis is then extended to a standard random coefficient model, where we derive asymptotic results for the average (common) parameters and develop a Wald-type test for general linear hypotheses. We also investigate the performance of the conventional pooled ordinary least squares (POLS) estimator within the random coefficients framework and show that it can be unreliable across a wide range of empirically relevant settings. Furthermore, we introduce a new test for parameter stability at a higher (superblock; Tier 2, Tier 3,...) level, assuming that parameters are stable across clusters within that level. Extensive simulation studies demonstrate the effectiveness of the proposed tests, and an empirical application illustrates their practical relevance.

Linear Regression: Inference Based on Cluster Estimates

TL;DR

Abstract

Paper Structure (14 sections, 17 theorems, 236 equations, 3 tables)

This paper contains 14 sections, 17 theorems, 236 equations, 3 tables.

Introduction
Model with Constant Parameters
The Proposed Estimator
Linear Model with Varying Parameters
Model with Cluster-Specific Random Coefficients
Testing Parameter Constancy Using Superblocks
Pooled Ordinary Least Squares
Simulation and Empirical Analysis
Simulation Study
General Linear Hypothesis Testing
Testing Parameter Constancy
Empirical Illustration
Conclusion
Appendix

Key Result

Theorem 2.1

Under Assumptions assump:exogeneity, assump:error independence and assump:Xg'Xg/Ng inverse is bounded, for strong, semi-strong or weak dependence, and for any cluster sizes, $\hat{\bar{\beta}}$ is consistent for $\beta$.

Theorems & Definitions (31)

Definition 2.1
Definition 2.2
Definition 2.3
Remark 1
Remark 2
Definition 2.4
Definition 2.5
Theorem 2.1
Lemma 2.1
Corollary 2.1
...and 21 more

Linear Regression: Inference Based on Cluster Estimates

TL;DR

Abstract

Linear Regression: Inference Based on Cluster Estimates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (31)