Table of Contents
Fetching ...

Peeling Context from Cause for Multimodal Molecular Property Prediction

Tao Li, Kaiyuan Hou, Tuan Vinh, Carl Yang, Monika Raj

TL;DR

This work addresses the problem that molecular-property predictors frequently exploit spurious context rather than true causal structure, leading to brittle performance under distribution shifts. It introduces CLaP, a layerwise causal–trivial peeling framework that fuses multimodal molecular representations (2D SMILES graphs, HELM, and 3D geometry) while progressively peeling context to isolate label-relevant signals. The method enforces batch-wise invariance via a depth-dependent correlation schedule and monotonicity, and produces atom-level causal saliency maps that align with chemical intuition. Across four benchmarks, CLaP achieves state-of-the-art regression performance and provides interpretable guidance for molecular design, with potential extensions to classification and other domains.

Abstract

Deep models are used for molecular property prediction, yet they are often difficult to interpret and may rely on spurious context rather than causal structure, which reduces reliability under distribution shift and harms predictive performance. We introduce CLaP (Causal Layerwise Peeling), a framework that separates causal signal from context in a layerwise manner and integrates diverse graph representations of molecules. At each layer, a causal block performs a soft split into causal and non-causal branches, fuses causal evidence across modalities, and progressively removes batch-coupled context to focus on label-relevant structure, thereby limiting shortcut signals and stabilizing layerwise refinement. Across four molecular benchmarks, CLaP consistently improves MAE, MSE, and $R^2$ over competitive baselines. The model also produces atom-level causal saliency maps that highlight substructures responsible for predictions, providing actionable guidance for targeted molecular edits. Case studies confirm the accuracy of these maps and their alignment with chemical intuition. By peeling context from cause at every layer, the model yields predictors that are both accurate and interpretable for molecular design.

Peeling Context from Cause for Multimodal Molecular Property Prediction

TL;DR

This work addresses the problem that molecular-property predictors frequently exploit spurious context rather than true causal structure, leading to brittle performance under distribution shifts. It introduces CLaP, a layerwise causal–trivial peeling framework that fuses multimodal molecular representations (2D SMILES graphs, HELM, and 3D geometry) while progressively peeling context to isolate label-relevant signals. The method enforces batch-wise invariance via a depth-dependent correlation schedule and monotonicity, and produces atom-level causal saliency maps that align with chemical intuition. Across four benchmarks, CLaP achieves state-of-the-art regression performance and provides interpretable guidance for molecular design, with potential extensions to classification and other domains.

Abstract

Deep models are used for molecular property prediction, yet they are often difficult to interpret and may rely on spurious context rather than causal structure, which reduces reliability under distribution shift and harms predictive performance. We introduce CLaP (Causal Layerwise Peeling), a framework that separates causal signal from context in a layerwise manner and integrates diverse graph representations of molecules. At each layer, a causal block performs a soft split into causal and non-causal branches, fuses causal evidence across modalities, and progressively removes batch-coupled context to focus on label-relevant structure, thereby limiting shortcut signals and stabilizing layerwise refinement. Across four molecular benchmarks, CLaP consistently improves MAE, MSE, and over competitive baselines. The model also produces atom-level causal saliency maps that highlight substructures responsible for predictions, providing actionable guidance for targeted molecular edits. Case studies confirm the accuracy of these maps and their alignment with chemical intuition. By peeling context from cause at every layer, the model yields predictors that are both accurate and interpretable for molecular design.

Paper Structure

This paper contains 42 sections, 43 equations, 13 figures, 7 tables, 1 algorithm.

Figures (13)

  • Figure 1: The original molecule in this illustrative example is iteratively “peeled” across layers. At shallow layers (layer 1), attribution underpeels and overweighs the carbon backbone. At the optimal layer, causal weights concentrate on the nitro group, yielding chemically plausible solubility drivers. Deeper peeling begins to smear attribution across the backbone, reflecting over-peeling in practice.
  • Figure 2: Overview of the causal layerwise peeling architecture for $L=2$ layers.
  • Figure 3: Performance comparison across four molecular property datasets.
  • Figure 3: Test $R^2$ under $\mathsf{do}(E)$ via re-batching.
  • Figure 5: Layer-wise attribution maps in the multimodality setting across four benchmark datasets. Each row corresponds to a dataset and the molecular property of interest, indicated by the green arrow on the left. Three regimes are shown: under-peeling (left), optimal (center), and over-peeling (right). Atom-level contributions are highlighted with red atoms representing features most attributed to increasing the target property and blue atoms representing less contributing features. Optimal peeling yields clearer and more interpretable attribution compared to under- and over-peeling.
  • ...and 8 more figures