No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

Gang Hu; Ke Qin; Chenhan Yuan; Min Peng; Alejandro Lopez-Lira; Benyou Wang; Sophia Ananiadou; Jimin Huang; Qianqian Xie

No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

Gang Hu, Ke Qin, Chenhan Yuan, Min Peng, Alejandro Lopez-Lira, Benyou Wang, Sophia Ananiadou, Jimin Huang, Qianqian Xie

TL;DR

This paper introduces ICE-PIXIU, a pioneering open bilingual Chinese–English framework for financial NLP that unifies an instruction-data suite (ICE-FIND), a bilingual LLM (ICE-INTENT) built via fine-tuning, and a bilingual evaluation benchmark (ICE-FLARE). By assembling 40 datasets (1.185M raw data, 603k instruction data, 95k evaluation data) across 10 NLP tasks and 20 bilingual tasks, the authors demonstrate that bilingual data and translation transfer substantially improve cross-lingual financial reasoning, with ICE-full-7B often surpassing strong baselines including GPT-4 on Chinese tasks. The work also presents detailed ablations, generalization analyses, and practical examples, underscoring the importance of data diversity, expert prompts, and translation data in achieving robust bilingual performance. Overall, ICE-PIXIU offers an open, scalable platform that advances bilingual financial NLP and enables cross-lingual finance research and applications.

Abstract

While the progression of Large Language Models (LLMs) has notably propelled financial analysis, their application has largely been confined to singular language realms, leaving untapped the potential of bilingual Chinese-English capacity. To bridge this chasm, we introduce ICE-PIXIU, seamlessly amalgamating the ICE-INTENT model and ICE-FLARE benchmark for bilingual financial analysis. ICE-PIXIU uniquely integrates a spectrum of Chinese tasks, alongside translated and original English datasets, enriching the breadth and depth of bilingual financial modeling. It provides unrestricted access to diverse model variants, a substantial compilation of diverse cross-lingual and multi-modal instruction data, and an evaluation benchmark with expert annotations, comprising 10 NLP tasks, 20 bilingual specific tasks, totaling 95k datasets. Our thorough evaluation emphasizes the advantages of incorporating these bilingual datasets, especially in translation tasks and utilizing original English data, enhancing both linguistic flexibility and analytical acuity in financial contexts. Notably, ICE-INTENT distinguishes itself by showcasing significant enhancements over conventional LLMs and existing financial LLMs in bilingual milieus, underscoring the profound impact of robust bilingual data on the accuracy and efficacy of financial NLP.

No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

TL;DR

Abstract

Paper Structure (17 sections, 3 figures, 12 tables)

This paper contains 17 sections, 3 figures, 12 tables.

Introduction
Related Work
Method
ICE-FIND: Bilingual Financial Instruction Dataset
Raw Data Integration
Instruction Construction
ICE-INTERN: Bilingual Financial Large Language Model
ICE-FLARE: Bilingual Financial Evaluation Benchmark
Experiments
Results
Overall Performance
Language Disparity
Generalization Ability
Ablation study
Practical Examples
...and 2 more sections

Figures (3)

Figure 1: A sunburst chart showing ICE-FLARE distribution across language, data types, NLP and specific tasks, and datasets.
Figure 2: Seven radar charts respectively demonstrate the average metrics on the specific tasks contained within different data types (DLC (A), DLE (B), DTT (C), DTE (D), DOT (E)), the average metrics across 5 data types (F), and the average metrics on 10 bilingual NLP tasks (G), along with a histogram showing the count of best results on Chinese (ZH) and English (EN) data (H).
Figure 3: Giving the example of translating BigData22 English dataset into CBigData22 Chinese dataset with human-written specific translation prompts. Specifically, we will translate the task prompts and twitter descriptions from BigData22 separately, while preserving the original stock information, which includes stock tickers with $ signs and historical stock prices, unchanged.

No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

TL;DR

Abstract

No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

Authors

TL;DR

Abstract

Table of Contents

Figures (3)