Cross-Fitting-Free Debiased Machine Learning with Multiway Dependence
Kaicheng Chen, Harold D. Chiang
TL;DR
The paper develops a cross-fitting–free asymptotic theory for two-step debiased machine learning in GMM models with general multiway clustered dependence, enabling valid inference without sample splitting in the presence of arbitrarily many clustering dimensions. It combines Neyman orthogonality with localisation to control first-stage estimation effects, deriving both global and local maximal inequalities for separately exchangeable arrays to establish asymptotic linearity and normality. A central result provides an explicit linear representation and variance formula, with rate conditions that accommodate flexible, high-dimensional nuisance learners (e.g., sparse GLMs, regression trees, and deep networks). The work delivers a practical inference framework for complex clustered environments and contributes new probabilistic tools of independent interest for multiway dependence.
Abstract
This paper develops an asymptotic theory for two-step debiased machine learning (DML) estimators in generalised method of moments (GMM) models with general multiway clustered dependence, without relying on cross-fitting. While cross-fitting is commonly employed, it can be statistically inefficient and computationally burdensome when first-stage learners are complex and the effective sample size is governed by the number of independent clusters. We show that valid inference can be achieved without sample splitting by combining Neyman-orthogonal moment conditions with a localisation-based empirical process approach, allowing for an arbitrary number of clustering dimensions. The resulting DML-GMM estimators are shown to be asymptotically linear and asymptotically normal under multiway clustered dependence. A central technical contribution of the paper is the derivation of novel global and local maximal inequalities for general classes of functions of sums of separately exchangeable arrays, which underpin our theoretical arguments and are of independent interest.
