Table of Contents
Fetching ...

Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset

Renliang Sun, Mengyuan Liu, Shiping Yang, Rui Wang, Junqing He, Jiaxing Zhang

TL;DR

NICO, a Natural Interactive COnversation dataset in Chinese, introduces two dialogue-level natural conversation tasks and two sentence-level tasks for identifying and rewriting unnatural sentences to help foster the natural dialogue capabilities of LLMs.

Abstract

Benefiting from diverse instruction datasets, contemporary Large Language Models (LLMs) perform effectively as AI assistants in collaborating with humans. However, LLMs still struggle to generate natural and colloquial responses in real-world applications such as chatbots and psychological counseling that require more human-like interactions. To address these limitations, we introduce NICO, a Natural Interactive COnversation dataset in Chinese. We first use GPT-4-turbo to generate dialogue drafts and make them cover 20 daily-life topics and 5 types of social interactions. Then, we hire workers to revise these dialogues to ensure that they are free of grammatical errors and unnatural utterances. We define two dialogue-level natural conversation tasks and two sentence-level tasks for identifying and rewriting unnatural sentences. Multiple open-source and closed-source LLMs are tested and analyzed in detail. The experimental results highlight the challenge of the tasks and demonstrate how NICO can help foster the natural dialogue capabilities of LLMs. The dataset will be released.

Fostering Natural Conversation in Large Language Models with NICO: a Natural Interactive COnversation dataset

TL;DR

NICO, a Natural Interactive COnversation dataset in Chinese, introduces two dialogue-level natural conversation tasks and two sentence-level tasks for identifying and rewriting unnatural sentences to help foster the natural dialogue capabilities of LLMs.

Abstract

Benefiting from diverse instruction datasets, contemporary Large Language Models (LLMs) perform effectively as AI assistants in collaborating with humans. However, LLMs still struggle to generate natural and colloquial responses in real-world applications such as chatbots and psychological counseling that require more human-like interactions. To address these limitations, we introduce NICO, a Natural Interactive COnversation dataset in Chinese. We first use GPT-4-turbo to generate dialogue drafts and make them cover 20 daily-life topics and 5 types of social interactions. Then, we hire workers to revise these dialogues to ensure that they are free of grammatical errors and unnatural utterances. We define two dialogue-level natural conversation tasks and two sentence-level tasks for identifying and rewriting unnatural sentences. Multiple open-source and closed-source LLMs are tested and analyzed in detail. The experimental results highlight the challenge of the tasks and demonstrate how NICO can help foster the natural dialogue capabilities of LLMs. The dataset will be released.
Paper Structure (24 sections, 3 figures, 10 tables)

This paper contains 24 sections, 3 figures, 10 tables.

Figures (3)

  • Figure 1: Distribution of interaction types of the dialog datasets. Only the constructed NICO dataset is able to cover all types of social interactions.
  • Figure 2: A prompt example for GPT-4-turbo to generate colloquial dialogues. We cannot use real Chinese names for privacy and ethical reasons. We have tested that using Chinese names such as Xiao Ming and Xiao Hong, which are not referential, have the same effect as using Tom and Amy. If researchers want to use NICO, they can replace Tom and Amy with any names they want.
  • Figure 3: The performance of LLMs in dialogs with different interaction types.