A Brief Introduction to Causal Inference in Machine Learning
Kyunghyun Cho
TL;DR
The note surveys the foundations of causal inference for machine learning through probabilistic graphical models and structural causal models, clarifying how to reason about interventions via the do-operator and causal quantities like ATE and CATE. It contrasts learning from observational data with active experimentation in randomized trials, and introduces methods such as regression, inverse probability weighting, matching, and instrumental variables to handle confounding. The material then connects causality to out-of-distribution generalization and invariance, arguing for stable causal correlations as the goal for robust prediction, with case studies in language modeling and policy evaluation. The closing sections outline additional techniques and practical considerations for behavior cloning, bandits, and causal reinforcement learning, highlighting a pathway from fundamental causal reasoning to scalable, real-world ML systems.
Abstract
This is a lecture note produced for DS-GA 3001.003 "Special Topics in DS - Causal Inference in Machine Learning" at the Center for Data Science, New York University in Spring, 2024. This course was created to target master's and PhD level students with basic background in machine learning but who were not exposed to causal inference or causal reasoning in general previously. In particular, this course focuses on introducing such students to expand their view and knowledge of machine learning to incorporate causal reasoning, as this aspect is at the core of so-called out-of-distribution generalization (or lack thereof.)
