MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

Shiwen Ni; Minghuan Tan; Yuelin Bai; Fuqiang Niu; Min Yang; Bowen Zhang; Ruifeng Xu; Xiaojun Chen; Chengming Li; Xiping Hu; Ye Li; Jianping Fan

MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

Shiwen Ni, Minghuan Tan, Yuelin Bai, Fuqiang Niu, Min Yang, Bowen Zhang, Ruifeng Xu, Xiaojun Chen, Chengming Li, Xiping Hu, Ye Li, Jianping Fan

TL;DR

MoZIP introduces the first multilingual benchmark for evaluating large language models in Intellectual Property, consisting of IPQuiz, IPQA, and PatentMatch across nine languages. It additionally presents MoZi, an IP-oriented multilingual model built on BLOOMZ-MT-7B and fine-tuned through patent pre-training, general instructions, and IP-specific instructions. Experimental results show that MoZi outperforms several BLOOMZ-based and multilingual baselines but lags behind ChatGPT on most tasks, underscoring the remaining challenges in IP-domain understanding. By releasing code, data, and the MoZi model, the work aims to standardize IP-focused evaluation and spur future improvements in multilingual IP knowledge for LLMs.

Abstract

Large language models (LLMs) have demonstrated impressive performance in various natural language processing (NLP) tasks. However, there is limited understanding of how well LLMs perform in specific domains (e.g, the intellectual property (IP) domain). In this paper, we contribute a new benchmark, the first Multilingual-oriented quiZ on Intellectual Property (MoZIP), for the evaluation of LLMs in the IP domain. The MoZIP benchmark includes three challenging tasks: IP multiple-choice quiz (IPQuiz), IP question answering (IPQA), and patent matching (PatentMatch). In addition, we also develop a new IP-oriented multilingual large language model (called MoZi), which is a BLOOMZ-based model that has been supervised fine-tuned with multilingual IP-related text data. We evaluate our proposed MoZi model and four well-known LLMs (i.e., BLOOMZ, BELLE, ChatGLM and ChatGPT) on the MoZIP benchmark. Experimental results demonstrate that MoZi outperforms BLOOMZ, BELLE and ChatGLM by a noticeable margin, while it had lower scores compared with ChatGPT. Notably, the performance of current LLMs on the MoZIP benchmark has much room for improvement, and even the most powerful ChatGPT does not reach the passing level. Our source code, data, and models are available at \url{https://github.com/AI-for-Science/MoZi}.

MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

TL;DR

Abstract

Paper Structure (12 sections, 7 figures, 3 tables)

This paper contains 12 sections, 7 figures, 3 tables.

Introduction
Primary data collection
Benchmark
The Proposed MoZi Model
Experiments
Training Details
Baselines
Experimental Results
Conclusion
Ethics Statement
Acknowledgements
Bibliographical References

Figures (7)

Figure 1: Statistics and distribution of data. ZH-Chinese, EN-English, DE-German, JA-Japanese, FR-French, KO-Korean, RU-Russian, ES-Spanish, PT-Portuguese, CA-Catalan.
Figure 2: Examples of questions in IPQuiz. The words in blue below non-English content are the corresponding English translations.
Figure 3: Examples of seven language questions in IPQA dataset. The words in blue below non-English content are the corresponding English translations.
Figure 4: An example in PatentMatch. The texts with blue color are overlappings between the source patent and each candidate patent. However, the texts with green background color are the key information why the two patents match each other.
Figure 5: Schematic of our proposed IP-oriented multilingual large language model MoZi.
...and 2 more figures

MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

TL;DR

Abstract

MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property

Authors

TL;DR

Abstract

Table of Contents

Figures (7)