Table of Contents
Fetching ...

Code-Based English Models Surprising Performance on Chinese QA Pair Extraction Task

Linghan Zheng, Hui Liu, Xiaojun Lin, Jiayuan Dong, Yue Sheng, Gang Shi, Zhiwei Liu, Hongwei Chen

TL;DR

The capabilities of code-based English models in specified Chinese tasks offer a distinct perspective for discussion on the philosophical "Chinese Room" thought experiment.

Abstract

In previous studies, code-based models have consistently outperformed text-based models in reasoning-intensive scenarios. When generating our knowledge base for Retrieval-Augmented Generation (RAG), we observed that code-based models also perform exceptionally well in Chinese QA Pair Extraction task. Further, our experiments and the metrics we designed discovered that code-based models containing a certain amount of Chinese data achieve even better performance. Additionally, the capabilities of code-based English models in specified Chinese tasks offer a distinct perspective for discussion on the philosophical "Chinese Room" thought experiment.

Code-Based English Models Surprising Performance on Chinese QA Pair Extraction Task

TL;DR

The capabilities of code-based English models in specified Chinese tasks offer a distinct perspective for discussion on the philosophical "Chinese Room" thought experiment.

Abstract

In previous studies, code-based models have consistently outperformed text-based models in reasoning-intensive scenarios. When generating our knowledge base for Retrieval-Augmented Generation (RAG), we observed that code-based models also perform exceptionally well in Chinese QA Pair Extraction task. Further, our experiments and the metrics we designed discovered that code-based models containing a certain amount of Chinese data achieve even better performance. Additionally, the capabilities of code-based English models in specified Chinese tasks offer a distinct perspective for discussion on the philosophical "Chinese Room" thought experiment.
Paper Structure (21 sections, 3 figures, 6 tables)

This paper contains 21 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Example of training data
  • Figure 2: Embeddings of a Chinese character
  • Figure 3: New Chinese Room