Cluster-Robust Inference for Quadratic Forms

Michal Kolesár; Pengjin Min; Wenjie Wang; Yichong Zhang

Cluster-Robust Inference for Quadratic Forms

Michal Kolesár, Pengjin Min, Wenjie Wang, Yichong Zhang

TL;DR

The paper develops a debiased inference framework for quadratic forms $\theta = \pi' A_0 \gamma$ under clustered, high-dimensional data. It establishes the asymptotic normality of a leave-one-cluster-out estimator and introduces two cluster-robust variance estimators: a consistent leave-three-clusters-out (L3CO) and a conservative leave-two-clusters-out (L2CO), with primitive rate conditions that allow diverging cluster sizes and flexible within-cluster dependence. The approach unifies IV with many instruments/controls, variance components, and testing many linear restrictions, while remaining computationally feasible through leave-out algebra and robust residual calculations. Simulation results show that L3CO/L2CO provide reliable size control where standard methods fail, while preserving meaningful power in challenging high-dimensional, clustered settings.

Abstract

This paper studies inference for quadratic forms of linear regression coefficients with clustered data and many covariates. Our framework covers three important special cases: instrumental variables regression with many instruments and controls, inference on variance components, and testing multiple restrictions in a linear regression. Naïve plug-in estimators are known to be biased. We study a leave-one-cluster-out estimator that is unbiased, and provide sufficient conditions for its asymptotic normality. For inference, we establish the consistency of a leave-three-cluster-out variance estimator under primitive conditions. In addition, we develop a novel leave-two-cluster-out variance estimator that is computationally simpler and guaranteed to be conservative under weaker conditions. Our analysis allows cluster sizes to diverge with the sample size, accommodates strong within-cluster dependence, and permits the dimension of the covariates to diverge with the sample size, potentially at the same rate.

Cluster-Robust Inference for Quadratic Forms

TL;DR

The paper develops a debiased inference framework for quadratic forms

under clustered, high-dimensional data. It establishes the asymptotic normality of a leave-one-cluster-out estimator and introduces two cluster-robust variance estimators: a consistent leave-three-clusters-out (L3CO) and a conservative leave-two-clusters-out (L2CO), with primitive rate conditions that allow diverging cluster sizes and flexible within-cluster dependence. The approach unifies IV with many instruments/controls, variance components, and testing many linear restrictions, while remaining computationally feasible through leave-out algebra and robust residual calculations. Simulation results show that L3CO/L2CO provide reliable size control where standard methods fail, while preserving meaningful power in challenging high-dimensional, clustered settings.

Abstract

Paper Structure (43 sections, 12 theorems, 442 equations, 3 figures, 3 tables)

This paper contains 43 sections, 12 theorems, 442 equations, 3 figures, 3 tables.

Introduction
Setup
Instrumental Variables Regression with Many Instruments and Controls
Variance and Covariance Components in Linear Regressions
Testing Many Linear Restrictions
Asymptotic Normality
Variance Estimator
Leave-three-clusters-out Variance Estimator
Leave-two-clusters-out Variance Estimator
Practical Guidance
Simulation
Design I: Homogeneous Treatment Effect
Design II: Heterogeneous Treatment Effect, Saturated
Design III: Heterogeneous Treatment Effect, Approximated
Remarks
...and 28 more sections

Key Result

Lemma 2.1

The bias-correction matrix $C$ that minimizes $\text{tr}(C'C)$ subject to (i) $\mathop{\mathrm{Bdiag}}\nolimits(C_{g,g})=\mathop{\mathrm{Bdiag}}\nolimits(A_{g,g})$ (ii) $CW=0$, and (iii) $W'C=0$ is given by $C_{\rm KR}=M\mathop{\mathrm{Bdiag}}\nolimits(\Lambda)M$, where $\mathop{\mathrm{bvec}}\nolim

Figures (3)

Figure 1: Power curves for the CNT23 simulation. Top row: 5% significance level. Bottom row: 10% significance level.
Figure 2: Power curves for the Yap24 simulation.
Figure 3: Power curves for the judge design simulation.

Theorems & Definitions (33)

Lemma 2.1
Remark 2.1
Remark 3.1
Remark 3.2
Remark 3.3
Remark 3.4
Remark 3.5
Remark 3.6
Remark 3.7
Theorem 3.1
...and 23 more

Cluster-Robust Inference for Quadratic Forms

TL;DR

Abstract

Cluster-Robust Inference for Quadratic Forms

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (33)