Table of Contents
Fetching ...

KAPPA: A Generic Patent Analysis Framework with Keyphrase-Based Portraits

Xin Xia, Yujin Wang, Jun Zhou, Guisheng Zhong, Linning Cai, Chen Zhang

TL;DR

KAPPA introduces a novel, interpretable patent analysis framework that builds keyphrase-based portraits by jointly extracting present and absent keyphrases from multi-level patent documents. Central to the approach are SC-One2Set and SetPLM, which fuse semantic keyword calibration with pretrained language models and a three-stage multitask training regime, enabling robust, parallel keyphrase generation across document levels. The portrait phase (PHD) leverages hierarchical prompts to synthesize portraits from Title/Abstract/Claims, while the portrait-based phase demonstrates improved patent classification, technology recognition, and summarization when portraits augment or replace raw text. The findings suggest that keyphrase-based portraits offer a concise, domain-grounded representation that enhances downstream patent analytics with practical implications for scalable, interpretable patent analysis across domains.

Abstract

Patent analysis highly relies on concise and interpretable document representations, referred to as patent portraits. Keyphrases, both present and absent, are ideal candidates for patent portraits due to their brevity, representativeness, and clarity. In this paper, we introduce KAPPA, an integrated framework designed to construct keyphrase-based patent portraits and enhance patent analysis. KAPPA operates in two phases: patent portrait construction and portrait-based analysis. To ensure effective portrait construction, we propose a semantic-calibrated keyphrase generation paradigm that integrates pre-trained language models with a prompt-based hierarchical decoding strategy to leverage the multi-level structural characteristics of patents. For portrait-based analysis, we develop a comprehensive framework that employs keyphrase-based patent portraits to enable efficient and accurate patent analysis. Extensive experiments on benchmark datasets of keyphrase generation, the proposed model achieves significant improvements compared to state-of-the-art baselines. Further experiments conducted on real-world patent applications demonstrate that our keyphrase-based portraits effectively capture domain-specific knowledge and enrich semantic representation for patent analysis tasks.

KAPPA: A Generic Patent Analysis Framework with Keyphrase-Based Portraits

TL;DR

KAPPA introduces a novel, interpretable patent analysis framework that builds keyphrase-based portraits by jointly extracting present and absent keyphrases from multi-level patent documents. Central to the approach are SC-One2Set and SetPLM, which fuse semantic keyword calibration with pretrained language models and a three-stage multitask training regime, enabling robust, parallel keyphrase generation across document levels. The portrait phase (PHD) leverages hierarchical prompts to synthesize portraits from Title/Abstract/Claims, while the portrait-based phase demonstrates improved patent classification, technology recognition, and summarization when portraits augment or replace raw text. The findings suggest that keyphrase-based portraits offer a concise, domain-grounded representation that enhances downstream patent analytics with practical implications for scalable, interpretable patent analysis across domains.

Abstract

Patent analysis highly relies on concise and interpretable document representations, referred to as patent portraits. Keyphrases, both present and absent, are ideal candidates for patent portraits due to their brevity, representativeness, and clarity. In this paper, we introduce KAPPA, an integrated framework designed to construct keyphrase-based patent portraits and enhance patent analysis. KAPPA operates in two phases: patent portrait construction and portrait-based analysis. To ensure effective portrait construction, we propose a semantic-calibrated keyphrase generation paradigm that integrates pre-trained language models with a prompt-based hierarchical decoding strategy to leverage the multi-level structural characteristics of patents. For portrait-based analysis, we develop a comprehensive framework that employs keyphrase-based patent portraits to enable efficient and accurate patent analysis. Extensive experiments on benchmark datasets of keyphrase generation, the proposed model achieves significant improvements compared to state-of-the-art baselines. Further experiments conducted on real-world patent applications demonstrate that our keyphrase-based portraits effectively capture domain-specific knowledge and enrich semantic representation for patent analysis tasks.

Paper Structure

This paper contains 27 sections, 19 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: The overall framework of KAPPA.
  • Figure 2: Comparison between One2Seq(left) and One2Set(right). This case focuses mainly on the decoder side and the number of target keyphrases is 3. For One2Set, $N=4$ and $k=2$.
  • Figure 3: Keyword Extraction on the Encoder Side of SC-One2Set.
  • Figure 4: The framework of SC-One2Set. In this case, $N=8$ and $k=2$. For simplicity, we use a special case where $\mathcal{W^K_P} = \mathcal{W^K}$.
  • Figure 5: KG Performance on Patent Datasets. MAP@M and NCDG@M compares all predictions to ground truths. MAP@5 and NCDG@5 focuses on the top-5 predictions.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1: Multi-level Documents
  • Definition 2: Present and Absent Keyphrases
  • Definition 3: Keywords