Interpretable Machine Learning in Physics: A Review
Sebastian Johann Wetzel, Seungwoong Ha, Raban Iten, Miriam Klopotek, Ziming Liu
TL;DR
This review surveys interpretability in ML as applied to physics, outlining why transparent models are essential for trust, debugging, and scientific understanding. It organizes concepts into notions of interpretation, philosophical perspectives, and a comprehensive catalog of algorithms, interpretability methods, and domain-specific applications across quantum, classical, high-energy, astrophysical, and complex systems. Key contributions include frameworks for distinguishing intrinsic versus post-hoc interpretability, and a synthesis of symbolic regression, Hamiltonian/Lagrangian-inspired networks, and symmetry/conservation discoveries that yield human-readable physical insights. By connecting physical principles with interpretable ML techniques, the work highlights how transparent representations can drive reliable discoveries and practical advancements in experimental and theoretical physics. The field is positioned as poised to enable AI-augmented scientific inference that remains aligned with human understanding and scientific rigor.
Abstract
Machine learning is increasingly transforming various scientific fields, enabled by advancements in computational power and access to large data sets from experiments and simulations. As artificial intelligence (AI) continues to grow in capability, these algorithms will enable many scientific discoveries beyond human capabilities. Since the primary goal of science is to understand the world around us, fully leveraging machine learning in scientific discovery requires models that are interpretable -- allowing experts to comprehend the concepts underlying machine-learned predictions. Successful interpretations increase trust in black-box methods, help reduce errors, allow for the improvement of the underlying models, enhance human-AI collaboration, and ultimately enable fully automated scientific discoveries that remain understandable to human scientists. This review examines the role of interpretability in machine learning applied to physics. We categorize different aspects of interpretability, discuss machine learning models in terms of both interpretability and performance, and explore the philosophical implications of interpretability in scientific inquiry. Additionally, we highlight recent advances in interpretable machine learning across many subfields of physics. By bridging boundaries between disciplines -- each with its own unique insights and challenges -- we aim to establish interpretable machine learning as a core research focus in science.
