Table of Contents
Fetching ...

Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph

Qiushi Xiong, Zhipeng Xu, Zhenghao Liu, Mengjia Wang, Zulong Chen, Yue Sun, Yu Gu, Xiaohua Li, Ge Yu

TL;DR

This paper tackles improving patent matching by leveraging a memory graph derived from LLM parametric memory to extract query-related entities and ontologies, enriching both retrieval and generation in a retrieval-augmented framework. It introduces MemGraph with two latent variables, $Z_{IR}$ and $Z_{Gen}$, and a memory graph traversal to derive $V^e(p)$ and $V^o(p)$, thereby enhancing semantic understanding of patents. Empirical evaluation on the PatentMatch dataset shows MemGraph achieving about a 17.68% accuracy improvement over vanilla LLMs and 10.85% over vanilla RAG, with strong generalization across backbones and IPC types, and improved reasoning as judged by GPT-4o. The work demonstrates that combining memory-based entity/ontology cues with RAG can substantially reduce uncertainty and produce more interpretable patent matching, with code and data publicly available at GitHub.

Abstract

Intellectual Property (IP) management involves strategically protecting and utilizing intellectual assets to enhance organizational innovation, competitiveness, and value creation. Patent matching is a crucial task in intellectual property management, which facilitates the organization and utilization of patents. Existing models often rely on the emergent capabilities of Large Language Models (LLMs) and leverage them to identify related patents directly. However, these methods usually depend on matching keywords and overlook the hierarchical classification and categorical relationships of patents. In this paper, we propose MemGraph, a method that augments the patent matching capabilities of LLMs by incorporating a memory graph derived from their parametric memory. Specifically, MemGraph prompts LLMs to traverse their memory to identify relevant entities within patents, followed by attributing these entities to corresponding ontologies. After traversing the memory graph, we utilize extracted entities and ontologies to improve the capability of LLM in comprehending the semantics of patents. Experimental results on the PatentMatch dataset demonstrate the effectiveness of MemGraph, achieving a 17.68% performance improvement over baseline LLMs. The further analysis highlights the generalization ability of MemGraph across various LLMs, both in-domain and out-of-domain, and its capacity to enhance the internal reasoning processes of LLMs during patent matching. All data and codes are available at https://github.com/NEUIR/MemGraph.

Enhancing the Patent Matching Capability of Large Language Models via the Memory Graph

TL;DR

This paper tackles improving patent matching by leveraging a memory graph derived from LLM parametric memory to extract query-related entities and ontologies, enriching both retrieval and generation in a retrieval-augmented framework. It introduces MemGraph with two latent variables, and , and a memory graph traversal to derive and , thereby enhancing semantic understanding of patents. Empirical evaluation on the PatentMatch dataset shows MemGraph achieving about a 17.68% accuracy improvement over vanilla LLMs and 10.85% over vanilla RAG, with strong generalization across backbones and IPC types, and improved reasoning as judged by GPT-4o. The work demonstrates that combining memory-based entity/ontology cues with RAG can substantially reduce uncertainty and produce more interpretable patent matching, with code and data publicly available at GitHub.

Abstract

Intellectual Property (IP) management involves strategically protecting and utilizing intellectual assets to enhance organizational innovation, competitiveness, and value creation. Patent matching is a crucial task in intellectual property management, which facilitates the organization and utilization of patents. Existing models often rely on the emergent capabilities of Large Language Models (LLMs) and leverage them to identify related patents directly. However, these methods usually depend on matching keywords and overlook the hierarchical classification and categorical relationships of patents. In this paper, we propose MemGraph, a method that augments the patent matching capabilities of LLMs by incorporating a memory graph derived from their parametric memory. Specifically, MemGraph prompts LLMs to traverse their memory to identify relevant entities within patents, followed by attributing these entities to corresponding ontologies. After traversing the memory graph, we utilize extracted entities and ontologies to improve the capability of LLM in comprehending the semantics of patents. Experimental results on the PatentMatch dataset demonstrate the effectiveness of MemGraph, achieving a 17.68% performance improvement over baseline LLMs. The further analysis highlights the generalization ability of MemGraph across various LLMs, both in-domain and out-of-domain, and its capacity to enhance the internal reasoning processes of LLMs during patent matching. All data and codes are available at https://github.com/NEUIR/MemGraph.

Paper Structure

This paper contains 13 sections, 15 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Illustration of Our MemGraph Method. The framework integrates memory graph into LLM-based patent matching, enabling more comprehensive semantic understanding and accurate patent similarity assessment.
  • Figure 2: The Illustration of Our MemGraph Method.
  • Figure 3: Evaluation Results of Different Models in Patent Matching Predictions. Figure \ref{['fig:two_scenarios']} illustrates the model uncertainty when generating the patent matching results. Figure \ref{['fig:reasoning']} demonstrates the quality of the reasoning process during patent matching. All models are implemented by using Llama-3.1-Instruct$_\textsc{8B}$ as the backbone model.