Table of Contents
Fetching ...

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset

Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, Shuzi Niu

TL;DR

DailyDialog introduces a high-quality, manually labeled multi-turn dialogue corpus focused on daily-life topics. It provides explicit annotation of dialogue acts (Inform, Questions, Directives, Commissive) and seven emotion categories, enabling analysis of intention and emotion in conversation. Through extensive experiments on retrieval and generation methods, the paper demonstrates that incorporating intention and emotion cues can improve coherence and alignment with ground-truth responses, while domain-aware pretraining effects depend on dataset similarity. The dataset, with its realistic flows and rich annotations, offers a valuable resource for dialog system research, including domain adaptation and emotion-aware dialogue management.

Abstract

We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems.

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset

TL;DR

DailyDialog introduces a high-quality, manually labeled multi-turn dialogue corpus focused on daily-life topics. It provides explicit annotation of dialogue acts (Inform, Questions, Directives, Commissive) and seven emotion categories, enabling analysis of intention and emotion in conversation. Through extensive experiments on retrieval and generation methods, the paper demonstrates that incorporating intention and emotion cues can improve coherence and alignment with ground-truth responses, while domain-aware pretraining effects depend on dataset similarity. The dataset, with its realistic flows and rich annotations, offers a valuable resource for dialog system research, including domain adaptation and emotion-aware dialogue management.

Abstract

We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems.

Paper Structure

This paper contains 21 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: An example in DailyDialog dataset. Some text is shortened for space. Best viewed in color.
  • Figure 2: Statistics in DailyDialog.