Table of Contents
Fetching ...

Linear Regression: Inference Based on Cluster Estimates

Subhodeep Dey, Gopal K. Basak, Samarjit Das

TL;DR

The paper addresses inference for regression with clustered data by introducing a cluster-averaged estimator that accounts for within-cluster dependence and remains consistent across varying cluster sizes. It extends to a random coefficients framework, deriving a central limit theorem for the average parameter and a Wald-type test for general linear hypotheses, as well as a novel superblock-based test for parameter constancy across higher-level blocks. Compared with POLS, the proposed method maintains consistency and often improved efficiency, particularly when large clusters dominate the sample. An empirical application to India’s HCES data demonstrates significant cross-state heterogeneity, underscoring the practical relevance of robust cluster-aware inference for cross-sectional analyses.

Abstract

This article proposes a novel estimator for regression coefficients in clustered data that explicitly accounts for within-cluster dependence. We study the asymptotic properties of the proposed estimator under both finite and infinite cluster sizes. The analysis is then extended to a standard random coefficient model, where we derive asymptotic results for the average (common) parameters and develop a Wald-type test for general linear hypotheses. We also investigate the performance of the conventional pooled ordinary least squares (POLS) estimator within the random coefficients framework and show that it can be unreliable across a wide range of empirically relevant settings. Furthermore, we introduce a new test for parameter stability at a higher (superblock; Tier 2, Tier 3,...) level, assuming that parameters are stable across clusters within that level. Extensive simulation studies demonstrate the effectiveness of the proposed tests, and an empirical application illustrates their practical relevance.

Linear Regression: Inference Based on Cluster Estimates

TL;DR

The paper addresses inference for regression with clustered data by introducing a cluster-averaged estimator that accounts for within-cluster dependence and remains consistent across varying cluster sizes. It extends to a random coefficients framework, deriving a central limit theorem for the average parameter and a Wald-type test for general linear hypotheses, as well as a novel superblock-based test for parameter constancy across higher-level blocks. Compared with POLS, the proposed method maintains consistency and often improved efficiency, particularly when large clusters dominate the sample. An empirical application to India’s HCES data demonstrates significant cross-state heterogeneity, underscoring the practical relevance of robust cluster-aware inference for cross-sectional analyses.

Abstract

This article proposes a novel estimator for regression coefficients in clustered data that explicitly accounts for within-cluster dependence. We study the asymptotic properties of the proposed estimator under both finite and infinite cluster sizes. The analysis is then extended to a standard random coefficient model, where we derive asymptotic results for the average (common) parameters and develop a Wald-type test for general linear hypotheses. We also investigate the performance of the conventional pooled ordinary least squares (POLS) estimator within the random coefficients framework and show that it can be unreliable across a wide range of empirically relevant settings. Furthermore, we introduce a new test for parameter stability at a higher (superblock; Tier 2, Tier 3,...) level, assuming that parameters are stable across clusters within that level. Extensive simulation studies demonstrate the effectiveness of the proposed tests, and an empirical application illustrates their practical relevance.
Paper Structure (14 sections, 17 theorems, 236 equations, 3 tables)

This paper contains 14 sections, 17 theorems, 236 equations, 3 tables.

Key Result

Theorem 2.1

Under Assumptions assump:exogeneity, assump:error independence and assump:Xg'Xg/Ng inverse is bounded, for strong, semi-strong or weak dependence, and for any cluster sizes, $\hat{\bar{\beta}}$ is consistent for $\beta$.

Theorems & Definitions (31)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Remark 1
  • Remark 2
  • Definition 2.4
  • Definition 2.5
  • Theorem 2.1
  • Lemma 2.1
  • Corollary 2.1
  • ...and 21 more