Table of Contents
Fetching ...

Retro-Rank-In: A Ranking-Based Approach for Inorganic Materials Synthesis Planning

Thorben Prein, Elton Pan, Sami Haddouti, Marco Lorenz, Janik Jehkul, Tymoteusz Wilk, Cansu Moran, Menelaos Panagiotis Fotiadis, Artur P. Toshev, Elsa Olivetti, Jennifer L. M. Rupp

TL;DR

Retro-Rank-In rethinks inorganic retrosynthesis as a precursor-set ranking problem by embedding targets and precursors into a shared latent space and learning a pairwise ranker on a bipartite graph. The method combines a transformer-based MTEncoder with a binary Ranker to predict $p(oldsymbol{x}_P|oldsymbol{x}_T)$, enabling new precursors beyond training data and providing diverse, high-quality synthesis routes. Extensive experiments across Complete Reaction Archive, Distinct Reactions, and Novel Material Systems demonstrate state-of-the-art performance, particularly in out-of-distribution scenarios and in generating diverse precursor sets without loss of accuracy. The work highlights robust extrapolation to novel precursors, improved probability calibration, and stronger generalization compared to multi-label baselines and retrieval-based approaches, offering a practical tool for accelerating inorganic materials synthesis. Limitations include the absence of explicit synthesis conditions and structural data, suggesting future integration with crystallography and larger pretrained models to further enhance predictive power and interpretability.

Abstract

Retrosynthesis strategically plans the synthesis of a chemical target compound from simpler, readily available precursor compounds. This process is critical for synthesizing novel inorganic materials, yet traditional methods in inorganic chemistry continue to rely on trial-and-error experimentation. Emerging machine-learning approaches struggle to generalize to entirely new reactions due to their reliance on known precursors, as they frame retrosynthesis as a multi-label classification task. To address these limitations, we propose Retro-Rank-In, a novel framework that reformulates the retrosynthesis problem by embedding target and precursor materials into a shared latent space and learning a pairwise ranker on a bipartite graph of inorganic compounds. We evaluate Retro-Rank-In's generalizability on challenging retrosynthesis dataset splits designed to mitigate data duplicates and overlaps. For instance, for Cr2AlB2, it correctly predicts the verified precursor pair CrB + Al despite never seeing them in training, a capability absent in prior work. Extensive experiments show that Retro-Rank-In sets a new state-of-the-art, particularly in out-of-distribution generalization and candidate set ranking, offering a powerful tool for accelerating inorganic material synthesis.

Retro-Rank-In: A Ranking-Based Approach for Inorganic Materials Synthesis Planning

TL;DR

Retro-Rank-In rethinks inorganic retrosynthesis as a precursor-set ranking problem by embedding targets and precursors into a shared latent space and learning a pairwise ranker on a bipartite graph. The method combines a transformer-based MTEncoder with a binary Ranker to predict , enabling new precursors beyond training data and providing diverse, high-quality synthesis routes. Extensive experiments across Complete Reaction Archive, Distinct Reactions, and Novel Material Systems demonstrate state-of-the-art performance, particularly in out-of-distribution scenarios and in generating diverse precursor sets without loss of accuracy. The work highlights robust extrapolation to novel precursors, improved probability calibration, and stronger generalization compared to multi-label baselines and retrieval-based approaches, offering a practical tool for accelerating inorganic materials synthesis. Limitations include the absence of explicit synthesis conditions and structural data, suggesting future integration with crystallography and larger pretrained models to further enhance predictive power and interpretability.

Abstract

Retrosynthesis strategically plans the synthesis of a chemical target compound from simpler, readily available precursor compounds. This process is critical for synthesizing novel inorganic materials, yet traditional methods in inorganic chemistry continue to rely on trial-and-error experimentation. Emerging machine-learning approaches struggle to generalize to entirely new reactions due to their reliance on known precursors, as they frame retrosynthesis as a multi-label classification task. To address these limitations, we propose Retro-Rank-In, a novel framework that reformulates the retrosynthesis problem by embedding target and precursor materials into a shared latent space and learning a pairwise ranker on a bipartite graph of inorganic compounds. We evaluate Retro-Rank-In's generalizability on challenging retrosynthesis dataset splits designed to mitigate data duplicates and overlaps. For instance, for Cr2AlB2, it correctly predicts the verified precursor pair CrB + Al despite never seeing them in training, a capability absent in prior work. Extensive experiments show that Retro-Rank-In sets a new state-of-the-art, particularly in out-of-distribution generalization and candidate set ranking, offering a powerful tool for accelerating inorganic material synthesis.

Paper Structure

This paper contains 49 sections, 7 equations, 10 figures, 12 tables.

Figures (10)

  • Figure 1: Retrosynthesis problem. Identifying the optimal precursor set for a given target material can be treated as a ranking problem. We use the binary classification probabilities of each set to determine its rank. Checkmarks indicate whether a ranked set corresponds to an experimentally verified synthesis.
  • Figure 2: Learning paradigms for inorganic retrosynthesis (a) Multi-label classification-based approaches, which constitute current state-of-the-art models noh2024retrieval, inherently predict known precursors $P$ from a fixed candidate set. (b) Our approach (Retro-Rank-In) overcomes this limitation by embedding both precursor and target materials into a shared latent space and predicting their chemical compatibility in synthetic routes. This enables extrapolation beyond known precursors, allowing the model to propose novel synthesis pathways for unseen materials. The red links highlight an exemplary case for link prediction between a target and a precursor.
  • Figure 3: Comparison of Top-K accuracy. Comparison of Retrieval-Retro and Retro-Rank-In on the Novel Materials Systems dataset (c). We see the performance gap between both approaches widening, especially for larger K.
  • Figure 4: MTEncoder architecture overview. This diagram illustrates the MTEncoder framework, where material compositions are tokenized and processed through a transformer model.
  • Figure 5: Ablation for layers. Retro-Rank-In tested for various numbers of layers. Results show the robustness of our method regarding hyperparameter choice.
  • ...and 5 more figures