Table of Contents
Fetching ...

Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification

Tao Zou, Le Yu, Junchen Ye, Leilei Sun, Bowen Du, Deqing Wang

TL;DR

This work presents an integrated framework that comprehensively considers patent-related information for patent classification and presents the model’s ability to capture the temporal patterns of assignees and the semantic dependencies among IPC codes.

Abstract

Patent classification aims to assign multiple International Patent Classification (IPC) codes to a given patent. Recent methods for automatically classifying patents mainly focus on analyzing the text descriptions of patents. However, apart from the texts, each patent is also associated with some assignees, and the knowledge of their applied patents is often valuable for classification. Furthermore, the hierarchical taxonomy formulated by the IPC system provides important contextual information and enables models to leverage the correlations between IPC codes for more accurate classification. However, existing methods fail to incorporate the above aspects. In this paper, we propose an integrated framework that comprehensively considers the information on patents for patent classification. To be specific, we first present an IPC codes correlations learning module to derive their semantic representations via adaptively passing and aggregating messages within the same level and across different levels along the hierarchical taxonomy. Moreover, we design a historical application patterns learning component to incorporate the corresponding assignee's previous patents by a dual channel aggregation mechanism. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions. Experiments on real-world datasets demonstrate the superiority of our approach over the existing methods. Besides, we present the model's ability to capture the temporal patterns of assignees and the semantic dependencies among IPC codes.

Adaptive Taxonomy Learning and Historical Patterns Modelling for Patent Classification

TL;DR

This work presents an integrated framework that comprehensively considers patent-related information for patent classification and presents the model’s ability to capture the temporal patterns of assignees and the semantic dependencies among IPC codes.

Abstract

Patent classification aims to assign multiple International Patent Classification (IPC) codes to a given patent. Recent methods for automatically classifying patents mainly focus on analyzing the text descriptions of patents. However, apart from the texts, each patent is also associated with some assignees, and the knowledge of their applied patents is often valuable for classification. Furthermore, the hierarchical taxonomy formulated by the IPC system provides important contextual information and enables models to leverage the correlations between IPC codes for more accurate classification. However, existing methods fail to incorporate the above aspects. In this paper, we propose an integrated framework that comprehensively considers the information on patents for patent classification. To be specific, we first present an IPC codes correlations learning module to derive their semantic representations via adaptively passing and aggregating messages within the same level and across different levels along the hierarchical taxonomy. Moreover, we design a historical application patterns learning component to incorporate the corresponding assignee's previous patents by a dual channel aggregation mechanism. Finally, we combine the contextual information of patent texts that contains the semantics of IPC codes, and assignees' sequential preferences to make predictions. Experiments on real-world datasets demonstrate the superiority of our approach over the existing methods. Besides, we present the model's ability to capture the temporal patterns of assignees and the semantic dependencies among IPC codes.
Paper Structure (30 sections, 15 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 30 sections, 15 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: The task is to predict IPC codes of patent documents. We show the difference between existing works and our framework. Specifically, $\bm{E}_T$ represents the textual embeddings of patent descriptions, $\bm{E}_H$ denotes the semantic embeddings of classification codes in the hierarchy taxonomy and $\bm{E}_S$ is the temporal representation of assignee $u_{10}$ learned from historical patents.
  • Figure 2: In scenario (a), we analyze assignees' repeated application behaviors by calculating the recurrence rates of IPC codes in current patents compared to historical patents applied in the last year or up to the present time. In (b), we depict the average co-occurrence ratios of different IPC codes assigned in patents in the "Transportation" field in two datasets and normalize the performance in each row.
  • Figure 3: Framework of the proposed model.
  • Figure 4: Performance on capturing high-order complex behavior patterns.
  • Figure 5: Performance of methods with varied lengths of historical patents for capturing assignees' historical patterns.
  • ...and 3 more figures