Peeling Context from Cause for Multimodal Molecular Property Prediction
Tao Li, Kaiyuan Hou, Tuan Vinh, Carl Yang, Monika Raj
TL;DR
This work addresses the problem that molecular-property predictors frequently exploit spurious context rather than true causal structure, leading to brittle performance under distribution shifts. It introduces CLaP, a layerwise causal–trivial peeling framework that fuses multimodal molecular representations (2D SMILES graphs, HELM, and 3D geometry) while progressively peeling context to isolate label-relevant signals. The method enforces batch-wise invariance via a depth-dependent correlation schedule and monotonicity, and produces atom-level causal saliency maps that align with chemical intuition. Across four benchmarks, CLaP achieves state-of-the-art regression performance and provides interpretable guidance for molecular design, with potential extensions to classification and other domains.
Abstract
Deep models are used for molecular property prediction, yet they are often difficult to interpret and may rely on spurious context rather than causal structure, which reduces reliability under distribution shift and harms predictive performance. We introduce CLaP (Causal Layerwise Peeling), a framework that separates causal signal from context in a layerwise manner and integrates diverse graph representations of molecules. At each layer, a causal block performs a soft split into causal and non-causal branches, fuses causal evidence across modalities, and progressively removes batch-coupled context to focus on label-relevant structure, thereby limiting shortcut signals and stabilizing layerwise refinement. Across four molecular benchmarks, CLaP consistently improves MAE, MSE, and $R^2$ over competitive baselines. The model also produces atom-level causal saliency maps that highlight substructures responsible for predictions, providing actionable guidance for targeted molecular edits. Case studies confirm the accuracy of these maps and their alignment with chemical intuition. By peeling context from cause at every layer, the model yields predictors that are both accurate and interpretable for molecular design.
