Table of Contents
Fetching ...

Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?

Guan-Ting Lin, Hung-yi Lee

TL;DR

The paper addresses whether LLMs can understand the implications of emphasized sentences in dialogue. It introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogues and human-annotated implications, and develops a GPT-4–based automatic evaluation pipeline that correlates with human judgments. By evaluating open-source and commercial LLMs, the study finds commercial models outperform open-source ones but still struggle with nuanced pragmatic interpretation. The work provides a practical benchmark and evaluation approach to advance dialogue systems' pragmatic understanding of emphasis.

Abstract

Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. While Large Language Models (LLMs) have revolutionized natural language processing, their ability to understand emphasis in dialogue remains unclear. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various LLMs, both open-source and commercial, to measure their performance in understanding emphasis. Additionally, we propose an automatic evaluation pipeline using GPT-4, which achieves a high correlation with human rating. Our findings reveal that although commercial LLMs generally perform better, there is still significant room for improvement in comprehending emphasized sentences.

Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?

TL;DR

The paper addresses whether LLMs can understand the implications of emphasized sentences in dialogue. It introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogues and human-annotated implications, and develops a GPT-4–based automatic evaluation pipeline that correlates with human judgments. By evaluating open-source and commercial LLMs, the study finds commercial models outperform open-source ones but still struggle with nuanced pragmatic interpretation. The work provides a practical benchmark and evaluation approach to advance dialogue systems' pragmatic understanding of emphasis.

Abstract

Emphasis is a crucial component in human communication, which indicates the speaker's intention and implication beyond pure text in dialogue. While Large Language Models (LLMs) have revolutionized natural language processing, their ability to understand emphasis in dialogue remains unclear. This paper introduces Emphasized-Talk, a benchmark with emphasis-annotated dialogue samples capturing the implications of emphasis. We evaluate various LLMs, both open-source and commercial, to measure their performance in understanding emphasis. Additionally, we propose an automatic evaluation pipeline using GPT-4, which achieves a high correlation with human rating. Our findings reveal that although commercial LLMs generally perform better, there is still significant room for improvement in comprehending emphasized sentences.
Paper Structure (21 sections, 4 figures, 6 tables)

This paper contains 21 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The illustration of Emphasized-Talk data collection pipeline.
  • Figure 2: Illustration of automatic and human evaluation of the model's predicted implications.
  • Figure 3: The template for selecting emphasized words and documenting their implied meanings.
  • Figure 4: The template for human evaluation, including instructions and grading policy.