Economy Watchers Survey Provides Datasets and Tasks for Japanese Financial Domain
Masahiro Suzuki, Hiroki Sakaji
TL;DR
Japanese financial-domain NLP tasks are underrepresented, hindering robust benchmarking. The authors create two large EWS-derived datasets covering current and future economic assessments and construct three tasks—domain classification, sentiment analysis, and reason classification—augmented by an automatic monthly update pipeline. Through evaluations of ChatGPT, GPT-4o, FinBERT, and DeBERTaV2, they find fine-tuned models generally outperform LLMs, with DeBERTaV2 maintaining strong performance and sentiment tasks proving particularly challenging. The resources are publicly released on Hugging Face Hub and GitHub, enabling ongoing evaluation and potential incorporation into economic trend indices in the Japanese financial domain.
Abstract
Natural language processing (NLP) tasks in English and general domains are widely available and are often used to evaluate pre-trained language models. In contrast, fewer tasks are available for languages other than English and in the financial domain. Particularly, tasks in the Japanese and financial domains are limited. We develop two large datasets using data published by a Japanese central government agency. The datasets provide three Japanese financial NLP tasks, including 3- and 12-class classifications for categorizing sentences, along with a 5-class classification task for sentiment analysis. Our datasets are designed to be comprehensive and updated by leveraging an automatic update framework that ensures that the latest task datasets are publicly always available.
