An Introduction to Double/Debiased Machine Learning
Achim Ahrens, Victor Chernozhukov, Christian Hansen, Damian Kozbur, Mark Schaffer, Thomas Wiemann
TL;DR
Double/Debiased Machine Learning (DML) provides a general semiparametric framework for valid inference on a low-dimensional target parameter when nuisance components are high-dimensional or nonparametric. By combining Neyman orthogonality with cross-fitting, DML reduces both regularization and overfitting biases, enabling flexible nuisance estimation (including non-tabular data) without sacrificing asymptotic validity. The paper articulates the theory, derives orthogonal scores for common targets (e.g., linear regression, PLR, IV, ATE), and demonstrates through simulations and empirical applications (notably GT-ATT and monopsony in online markets) that the orthogonal + cross-fitting approach yields robust inference while highlighting practical diagnostics and implementation concerns. Overall, DML extends credible causal inference to complex data structures, offering a practical toolkit with careful monitoring of nuisance estimation quality and model specifications.
Abstract
This paper provides an introduction to Double/Debiased Machine Learning (DML). DML is a general approach to performing inference about a target parameter in the presence of nuisance functions: objects that are needed to identify the target parameter but are not of primary interest. Nuisance functions arise naturally in many settings, such as when controlling for confounding variables or leveraging instruments. The paper describes two biases that arise from nuisance function estimation and explains how DML alleviates these biases. Consequently, DML allows the use of flexible methods, including machine learning tools, for estimating nuisance functions, reducing the dependence on auxiliary functional form assumptions and enabling the use of complex non-tabular data, such as text or images. We illustrate the application of DML through simulations and empirical examples. We conclude with a discussion of recommended practices. A companion website includes additional examples with code and references to other resources.
