Machine-Learning-Powered Specification Testing in Linear Instrumental Variable Models

Cyrill Scheidegger; Malte Londschien; Peter Bühlmann

Machine-Learning-Powered Specification Testing in Linear Instrumental Variable Models

Cyrill Scheidegger, Malte Londschien, Peter Bühlmann

Abstract

The linear instrumental variable (IV) model is widely used in observational studies, yet its validity hinges on strong assumptions. Classical specification tests such as the Sargan-Hansen J test are limited to overidentified settings and are therefore not applicable in the common just-identified case, where the number of instruments is equal to the number of endogenous variables. We propose a novel test for the well-specification of the linear IV model under the assumption that the structural error is mean independent of the instruments. This assumption enables specification testing even in the just-identified setting. Our approach uses the idea of residual prediction: if the two-stage least squares residuals can be predicted from the instruments better than chance, this indicates misspecification. The resulting test employs sample splitting and a user-chosen machine learning method, and we show asymptotic type I error control and consistency against a broad class of alternatives. We further show how the proposed testing principle can be adapted to settings with weak or many instruments via an Anderson-Rubin-type inversion, thereby substantially extending the applicability. The tests accommodate heteroskedasticity- and cluster-robust inference and are implemented in the R package RPIV and the ivmodels software package for Python.

Machine-Learning-Powered Specification Testing in Linear Instrumental Variable Models

Abstract

Machine-Learning-Powered Specification Testing in Linear Instrumental Variable Models

Abstract

Paper Structure

Table of Contents

Key Result

Figures (26)

Theorems & Definitions (32)