Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph
Qiushi Xiong, Zhipeng Xu, Zhenghao Liu, Mengjia Wang, Zulong Chen, Yue Sun, Yu Gu, Xiaohua Li, Ge Yu
TL;DR
This paper tackles improving patent matching by leveraging a memory graph derived from LLM parametric memory to extract query-related entities and ontologies, enriching both retrieval and generation in a retrieval-augmented framework. It introduces MemGraph with two latent variables, $Z_{IR}$ and $Z_{Gen}$, and a memory graph traversal to derive $V^e(p)$ and $V^o(p)$, thereby enhancing semantic understanding of patents. Empirical evaluation on the PatentMatch dataset shows MemGraph achieving about a 17.68% accuracy improvement over vanilla LLMs and 10.85% over vanilla RAG, with strong generalization across backbones and IPC types, and improved reasoning as judged by GPT-4o. The work demonstrates that combining memory-based entity/ontology cues with RAG can substantially reduce uncertainty and produce more interpretable patent matching, with code and data publicly available at GitHub.
Abstract
Intellectual Property (IP) management involves strategically protecting and utilizing intellectual assets to enhance organizational innovation, competitiveness, and value creation. Patent matching is a crucial task in intellectual property management, which facilitates the organization and utilization of patents. Existing models often rely on the emergent capabilities of Large Language Models (LLMs) and leverage them to identify related patents directly. However, these methods usually depend on matching keywords and overlook the hierarchical classification and categorical relationships of patents. In this paper, we propose MemGraph, a method that augments the patent matching capabilities of LLMs by incorporating a memory graph derived from their parametric memory. Specifically, MemGraph prompts LLMs to traverse their memory to identify relevant entities within patents, followed by attributing these entities to corresponding ontologies. After traversing the memory graph, we utilize extracted entities and ontologies to improve the capability of LLM in comprehending the semantics of patents. Experimental results on the PatentMatch dataset demonstrate the effectiveness of MemGraph, achieving a 17.68% performance improvement over baseline LLMs. The further analysis highlights the generalization ability of MemGraph across various LLMs, both in-domain and out-of-domain, and its capacity to enhance the internal reasoning processes of LLMs during patent matching. All data and codes are available at https://github.com/NEUIR/MemGraph.
