Tracing Distribution Shifts with Causal System Maps
Joran Leest, Ilias Gerostathopoulos, Patricia Lago, Claudia Raibulet
TL;DR
The paper tackles the problem of attributing distribution shifts in ML systems to their causes rather than merely detecting that a shift has occurred. It proposes ML System Maps, a hierarchical, causal, view-based representation that explicitly models the environment and internal subsystems to trace shifts. The approach defines a notation and three coordinated views, and develops AQ1–AQ3 attribution patterns to route, localize, and externalize causes, with potential use of Shapley-based attribution. If validated, ML System Maps could improve incident response, cross-team collaboration, and the reliability of ML deployments by providing structured, end-to-end shift explanation.
Abstract
Monitoring machine learning (ML) systems is hard, with standard practice focusing on detecting distribution shifts rather than their causes. Root-cause analysis often relies on manual tracing to determine whether a shift is caused by software faults, data-quality issues, or natural change. We propose ML System Maps -- causal maps that, through layered views, make explicit the propagation paths between the environment and the ML system's internals, enabling systematic attribution of distribution shifts. We outline the approach and a research agenda for its development and evaluation.
