Intelligent System for Automated Molecular Patent Infringement Assessment
Yaorui Shi, Sihang Li, Taiyan Zhang, Xi Fang, Jiankun Wang, Zhiyuan Liu, Guojiang Zhao, Zhengdan Zhu, Zhifeng Gao, Renxin Zhong, Linfeng Zhang, Guolin Ke, Weinan E, Hengxing Cai, Xiang Wang
TL;DR
This work introduces PatentFinder, a multi-agent system that decomposes automated molecular patent infringement assessment into specialized subtasks handled by tool-enabled agents, addressing limitations of large language models in interpreting complex Markush structures. It combines two neural tools (MarkushMatcher and MarkushParser) with a benchmark dataset MolPatent-240 to demonstrate improved accuracy (notably a 13.8% F1 and 12% accuracy gain over baselines) and increased interpretability via autonomous infringement reports. The approach is validated through extensive experiments, including evaluations of the Markush tools and case studies that illustrate reduced hallucinations and clearer reasoning paths. The MolPatent-240 dataset and the toolchain enable robust, scalable patent-protection analysis integrated into AI-driven drug discovery, with potential applicability to other scientific workflows.
Abstract
Automated drug discovery offers significant potential for accelerating the development of novel therapeutics by substituting labor-intensive human workflows with machine-driven processes. However, molecules generated by artificial intelligence may unintentionally infringe on existing patents, posing legal and financial risks that impede the full automation of drug discovery pipelines. This paper introduces PatentFinder, a novel multi-agent and tool-enhanced intelligence system that can accurately and comprehensively evaluate small molecules for patent infringement. PatentFinder features five specialized agents that collaboratively analyze patent claims and molecular structures with heuristic and model-based tools, generating interpretable infringement reports. To support systematic evaluation, we curate MolPatent-240, a benchmark dataset tailored for patent infringement assessment algorithms. On this benchmark, PatentFinder outperforms baseline methods that rely solely on large language models or specialized chemical tools, achieving a 13.8% improvement in F1-score and a 12% increase in accuracy. Additionally, PatentFinder autonomously generates detailed and interpretable patent infringement reports, showcasing enhanced accuracy and improved interpretability. The high accuracy and interpretability of PatentFinder make it a valuable and reliable tool for automating patent infringement assessments, offering a practical solution for integrating patent protection analysis into the drug discovery pipeline.
