Cluster-Robust Inference for Quadratic Forms
Michal Kolesár, Pengjin Min, Wenjie Wang, Yichong Zhang
TL;DR
The paper develops a debiased inference framework for quadratic forms $\theta = \pi' A_0 \gamma$ under clustered, high-dimensional data. It establishes the asymptotic normality of a leave-one-cluster-out estimator and introduces two cluster-robust variance estimators: a consistent leave-three-clusters-out (L3CO) and a conservative leave-two-clusters-out (L2CO), with primitive rate conditions that allow diverging cluster sizes and flexible within-cluster dependence. The approach unifies IV with many instruments/controls, variance components, and testing many linear restrictions, while remaining computationally feasible through leave-out algebra and robust residual calculations. Simulation results show that L3CO/L2CO provide reliable size control where standard methods fail, while preserving meaningful power in challenging high-dimensional, clustered settings.
Abstract
This paper studies inference for quadratic forms of linear regression coefficients with clustered data and many covariates. Our framework covers three important special cases: instrumental variables regression with many instruments and controls, inference on variance components, and testing multiple restrictions in a linear regression. Naïve plug-in estimators are known to be biased. We study a leave-one-cluster-out estimator that is unbiased, and provide sufficient conditions for its asymptotic normality. For inference, we establish the consistency of a leave-three-cluster-out variance estimator under primitive conditions. In addition, we develop a novel leave-two-cluster-out variance estimator that is computationally simpler and guaranteed to be conservative under weaker conditions. Our analysis allows cluster sizes to diverge with the sample size, accommodates strong within-cluster dependence, and permits the dimension of the covariates to diverge with the sample size, potentially at the same rate.
