JBE-QA: Japanese Bar Exam QA Dataset for Assessing Legal Domain Knowledge
Zhihan Cao, Fumihito Nishino, Hiroaki Yamada, Nguyen Ha Thanh, Yusuke Miyao, Ken Satoh
TL;DR
JBE-QA introduces a comprehensive Japanese legal knowledge benchmark by extracting 3,464 binary true/false judgments from the tantō-shiki portion of the Japanese Bar Exam (2015–2024), covering Civil Code, Penal Code, and Constitution. The dataset reformats multi-choice questions into independent per-statement judgments to facilitate automated evaluation and instruction-following analysis. Baseline experiments across 26 LLMs, including proprietary, open-weight, and Japanese-specialised, show that reasoning-enabled proprietary models achieve the best performance, with Constitution items being comparatively easier than Civil or Penal Code items. The work highlights the importance of domain-specific benchmarks for non-English legal knowledge and identifies directions for future work, including broader subject coverage, improved instruction adherence, and integration of ronbun-shiki data.
Abstract
We introduce JBE-QA, a Japanese Bar Exam Question-Answering dataset to evaluate large language models' legal knowledge. Derived from the multiple-choice (tanto-shiki) section of the Japanese bar exam (2015-2024), JBE-QA provides the first comprehensive benchmark for Japanese legal-domain evaluation of LLMs. It covers the Civil Code, the Penal Code, and the Constitution, extending beyond the Civil Code focus of prior Japanese resources. Each question is decomposed into independent true/false judgments with structured contextual fields. The dataset contains 3,464 items with balanced labels. We evaluate 26 LLMs, including proprietary, open-weight, Japanese-specialised, and reasoning models. Our results show that proprietary models with reasoning enabled perform best, and the Constitution questions are generally easier than the Civil Code or the Penal Code questions.
