Table of Contents
Fetching ...

Won: Establishing Best Practices for Korean Financial NLP

Guijin Son, Hyunwoo Ko, Haneral Jung, Chami Hwang

TL;DR

This paper presents the first open leaderboard for Korean financial large language models, spanning five MCQA categories and an open-ended FinQA task, to drive open research and safer deployment in finance. It documents an 8-week competition with 1,119 submissions and releases an 80k-instruction dataset, providing a practical blueprint of effective tuning strategies. The authors train ₩on, a fully open Korean-finance LLM, using SFT followed by DPO on the gathered data and show strong gains in Finance & Accounting and FinQA tasks, while noting weaker performance in market-focused tasks. Overall, this work advances Korean financial NLP by offering a comprehensive benchmark, transparent evaluation, and a publicly available reasoning model to guide future development across languages.

Abstract

In this work, we present the first open leaderboard for evaluating Korean large language models focused on finance. Operated for about eight weeks, the leaderboard evaluated 1,119 submissions on a closed benchmark covering five MCQA categories: finance and accounting, stock price prediction, domestic company analysis, financial markets, and financial agent tasks and one open-ended qa task. Building on insights from these evaluations, we release an open instruction dataset of 80k instances and summarize widely used training strategies observed among top-performing models. Finally, we introduce Won, a fully open and transparent LLM built using these best practices. We hope our contributions help advance the development of better and safer financial LLMs for Korean and other languages.

Won: Establishing Best Practices for Korean Financial NLP

TL;DR

This paper presents the first open leaderboard for Korean financial large language models, spanning five MCQA categories and an open-ended FinQA task, to drive open research and safer deployment in finance. It documents an 8-week competition with 1,119 submissions and releases an 80k-instruction dataset, providing a practical blueprint of effective tuning strategies. The authors train ₩on, a fully open Korean-finance LLM, using SFT followed by DPO on the gathered data and show strong gains in Finance & Accounting and FinQA tasks, while noting weaker performance in market-focused tasks. Overall, this work advances Korean financial NLP by offering a comprehensive benchmark, transparent evaluation, and a publicly available reasoning model to guide future development across languages.

Abstract

In this work, we present the first open leaderboard for evaluating Korean large language models focused on finance. Operated for about eight weeks, the leaderboard evaluated 1,119 submissions on a closed benchmark covering five MCQA categories: finance and accounting, stock price prediction, domestic company analysis, financial markets, and financial agent tasks and one open-ended qa task. Building on insights from these evaluations, we release an open instruction dataset of 80k instances and summarize widely used training strategies observed among top-performing models. Finally, we introduce Won, a fully open and transparent LLM built using these best practices. We hope our contributions help advance the development of better and safer financial LLMs for Korean and other languages.

Paper Structure

This paper contains 26 sections, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Distribution of participants. The shades of blue bars indicate corporate participants.
  • Figure 2: Preliminary round performance trends.
  • Figure 3: Evaluation results reported but Hi-Q. Performance of each methodology is represented by boxed numbers, and green numbers indicate the improvement over CPT.
  • Figure 4: Statistics of prompt and response length in ₩on-Instruct.
  • Figure 5: Model submission trends during the preliminary rounds.
  • ...and 1 more figures