Table of Contents
Fetching ...

Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation

Jaechang Kim, Jinmin Goh, Inseok Hwang, Jaewoong Cho, Jungseul Ok

TL;DR

This work addresses the explainability gap between expert chess engines and large language models by introducing Concept-guided Chess Commentary (CCC) and GCC-Eval. CCC blends concept-based explanations from an expert model with LLM reasoning to generate commentary that is accurate, informative, and fluent, while GCC-Eval provides a multi-dimensional, domain-informed evaluation aligned with human judgments. The approach demonstrates superior performance in informativeness and linguistic quality, and shows robust correlations with human assessments, supporting its potential for human education and reliable AI explanations. The framework is broadly applicable to other domains requiring deep decision-making explanations, with future directions including broader concept sets and domain expansion.

Abstract

Deep learning-based expert models have reached superhuman performance in decision-making domains such as chess and Go. However, it is under-explored to explain or comment on given decisions although it is important for model explainability and human education. The outputs of expert models are accurate, but yet difficult to interpret for humans. On the other hand, large language models (LLMs) can produce fluent commentary but are prone to hallucinations due to their limited decision-making capabilities. To bridge this gap between expert models and LLMs, we focus on chess commentary as a representative task of explaining complex decision-making processes through language and address both the generation and evaluation of commentary. We introduce Concept-guided Chess Commentary generation (CCC) for producing commentary and GPT-based Chess Commentary Evaluation (GCC-Eval) for assessing it. CCC integrates the decision-making strengths of expert models with the linguistic fluency of LLMs through prioritized, concept-based explanations. GCC-Eval leverages expert knowledge to evaluate chess commentary based on informativeness and linguistic quality. Experimental results, validated by both human judges and GCC-Eval, demonstrate that CCC generates commentary which is accurate, informative, and fluent.

Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation

TL;DR

This work addresses the explainability gap between expert chess engines and large language models by introducing Concept-guided Chess Commentary (CCC) and GCC-Eval. CCC blends concept-based explanations from an expert model with LLM reasoning to generate commentary that is accurate, informative, and fluent, while GCC-Eval provides a multi-dimensional, domain-informed evaluation aligned with human judgments. The approach demonstrates superior performance in informativeness and linguistic quality, and shows robust correlations with human assessments, supporting its potential for human education and reliable AI explanations. The framework is broadly applicable to other domains requiring deep decision-making explanations, with future directions including broader concept sets and domain expansion.

Abstract

Deep learning-based expert models have reached superhuman performance in decision-making domains such as chess and Go. However, it is under-explored to explain or comment on given decisions although it is important for model explainability and human education. The outputs of expert models are accurate, but yet difficult to interpret for humans. On the other hand, large language models (LLMs) can produce fluent commentary but are prone to hallucinations due to their limited decision-making capabilities. To bridge this gap between expert models and LLMs, we focus on chess commentary as a representative task of explaining complex decision-making processes through language and address both the generation and evaluation of commentary. We introduce Concept-guided Chess Commentary generation (CCC) for producing commentary and GPT-based Chess Commentary Evaluation (GCC-Eval) for assessing it. CCC integrates the decision-making strengths of expert models with the linguistic fluency of LLMs through prioritized, concept-based explanations. GCC-Eval leverages expert knowledge to evaluate chess commentary based on informativeness and linguistic quality. Experimental results, validated by both human judges and GCC-Eval, demonstrate that CCC generates commentary which is accurate, informative, and fluent.

Paper Structure

This paper contains 47 sections, 1 equation, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Comparison of chess commentary generation methods. The red color indicates incorrect information.
  • Figure 2: Overview of CCC, consists of (a) extracting concept vectors and (b) generating concept-guided commentary.
  • Figure 3: Examples of generated comments.Red text denotes incorrect information, and blue text denotes important concepts and affected counterparts.
  • Figure A1: Example prompts for GCC-Eval. The blue text in the figure changes according to the experimental conditions.
  • Figure :
  • ...and 5 more figures