Code-Based English Models Surprising Performance on Chinese QA Pair Extraction Task

Linghan Zheng; Hui Liu; Xiaojun Lin; Jiayuan Dong; Yue Sheng; Gang Shi; Zhiwei Liu; Hongwei Chen

Code-Based English Models Surprising Performance on Chinese QA Pair Extraction Task

Linghan Zheng, Hui Liu, Xiaojun Lin, Jiayuan Dong, Yue Sheng, Gang Shi, Zhiwei Liu, Hongwei Chen

TL;DR

The capabilities of code-based English models in specified Chinese tasks offer a distinct perspective for discussion on the philosophical "Chinese Room" thought experiment.

Abstract

In previous studies, code-based models have consistently outperformed text-based models in reasoning-intensive scenarios. When generating our knowledge base for Retrieval-Augmented Generation (RAG), we observed that code-based models also perform exceptionally well in Chinese QA Pair Extraction task. Further, our experiments and the metrics we designed discovered that code-based models containing a certain amount of Chinese data achieve even better performance. Additionally, the capabilities of code-based English models in specified Chinese tasks offer a distinct perspective for discussion on the philosophical "Chinese Room" thought experiment.

Code-Based English Models Surprising Performance on Chinese QA Pair Extraction Task

TL;DR

The capabilities of code-based English models in specified Chinese tasks offer a distinct perspective for discussion on the philosophical "Chinese Room" thought experiment.

Abstract

Paper Structure (21 sections, 3 figures, 6 tables)

This paper contains 21 sections, 3 figures, 6 tables.

Introduction
Related Work
Methodology
Datasets
Methods
Experiments
Evaluation Metrics Definition
Results
Code-based LLMs better than other LLMs
Less Domain Knowledge, Better Performance
A Moderate Amount of Chinese is Better
QLoRA fails to replicate the effects
Discussion
Conclusions
Future Work
...and 6 more sections

Figures (3)

Figure 1: Example of training data
Figure 2: Embeddings of a Chinese character
Figure 3: New Chinese Room

Code-Based English Models Surprising Performance on Chinese QA Pair Extraction Task

TL;DR

Abstract

Code-Based English Models Surprising Performance on Chinese QA Pair Extraction Task

Authors

TL;DR

Abstract

Table of Contents

Figures (3)