Retro-Rank-In: A Ranking-Based Approach for Inorganic Materials Synthesis Planning
Thorben Prein, Elton Pan, Sami Haddouti, Marco Lorenz, Janik Jehkul, Tymoteusz Wilk, Cansu Moran, Menelaos Panagiotis Fotiadis, Artur P. Toshev, Elsa Olivetti, Jennifer L. M. Rupp
TL;DR
Retro-Rank-In rethinks inorganic retrosynthesis as a precursor-set ranking problem by embedding targets and precursors into a shared latent space and learning a pairwise ranker on a bipartite graph. The method combines a transformer-based MTEncoder with a binary Ranker to predict $p(oldsymbol{x}_P|oldsymbol{x}_T)$, enabling new precursors beyond training data and providing diverse, high-quality synthesis routes. Extensive experiments across Complete Reaction Archive, Distinct Reactions, and Novel Material Systems demonstrate state-of-the-art performance, particularly in out-of-distribution scenarios and in generating diverse precursor sets without loss of accuracy. The work highlights robust extrapolation to novel precursors, improved probability calibration, and stronger generalization compared to multi-label baselines and retrieval-based approaches, offering a practical tool for accelerating inorganic materials synthesis. Limitations include the absence of explicit synthesis conditions and structural data, suggesting future integration with crystallography and larger pretrained models to further enhance predictive power and interpretability.
Abstract
Retrosynthesis strategically plans the synthesis of a chemical target compound from simpler, readily available precursor compounds. This process is critical for synthesizing novel inorganic materials, yet traditional methods in inorganic chemistry continue to rely on trial-and-error experimentation. Emerging machine-learning approaches struggle to generalize to entirely new reactions due to their reliance on known precursors, as they frame retrosynthesis as a multi-label classification task. To address these limitations, we propose Retro-Rank-In, a novel framework that reformulates the retrosynthesis problem by embedding target and precursor materials into a shared latent space and learning a pairwise ranker on a bipartite graph of inorganic compounds. We evaluate Retro-Rank-In's generalizability on challenging retrosynthesis dataset splits designed to mitigate data duplicates and overlaps. For instance, for Cr2AlB2, it correctly predicts the verified precursor pair CrB + Al despite never seeing them in training, a capability absent in prior work. Extensive experiments show that Retro-Rank-In sets a new state-of-the-art, particularly in out-of-distribution generalization and candidate set ranking, offering a powerful tool for accelerating inorganic material synthesis.
