Multimodal Search in Chemical Documents and Reactions
Ayush Kumar Shah, Abhisek Dey, Leo Luo, Bryan Amador, Patrick Philippy, Ming Zhong, Siru Ouyang, David Mark Friday, David Bianchi, Nick Jackson, Richard Zanibbi, Jiawei Han
TL;DR
This work tackles the fragmented retrieval of chemical knowledge by introducing a multimodal search system that directly links molecular diagrams, text passages, and extracted reaction data. It combines ReactionMiner-based reaction extraction, SMILES generation from text and diagrams, and diagram parsing with RDKit-based structure search, all backed by BM25 text retrieval and a multimodal ranking fusion. The approach enables passage-level access with linked diagrams and reaction contexts, supporting text, structure, and Reaction SMARTS queries, including dedicated reaction navigation within documents. Expert evaluation on Suzuki coupling literature demonstrates practical utility and identifies avenues for metadata enrichment, ranking transparency, and diagram-text linking improvements, with future work aimed at scaling and enhancing cross-modal representations.
Abstract
We present a multimodal search tool that facilitates retrieval of chemical reactions, molecular structures, and associated text from scientific literature. Queries may combine molecular diagrams, textual descriptions, and reaction data, allowing users to connect different representations of chemical information. To support this, the indexing process includes chemical diagram extraction and parsing, extraction of reaction data from text in tabular form, and cross-modal linking of diagrams and their mentions in text. We describe the system's architecture, key functionalities, and retrieval process, along with expert assessments of the system. This demo highlights the workflow and technical components of the search system.
