Table of Contents
Fetching ...

Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue

Junkai Zhou, Liang Pang, Huawei Shen, Xueqi Cheng

TL;DR

The paper tackles the gap between large language models and human-like communication by introducing CSIM, a framework that couples inner monologue with five linguistically grounded skills to cultivate proactive, empathetic, and coherent dialogue. It formalizes each skill with definitional prompts and integrates them through prompt design and in-context learning, enabling LLMs to think before speaking. A new benchmark, Cskills, is built to evaluate these skills via self-chat and human-bot conversations, with both automatic and human evaluations. Across ChatGPT and Vicuna-13b, CSIM demonstrates consistent improvements over baselines and CoT in humanness, proactivity, engagement, and goal attainment, validating the approach's practical impact for open-domain dialogue systems.

Abstract

The emergence of large language models (LLMs) further improves the capabilities of open-domain dialogue systems and can generate fluent, coherent, and diverse responses. However, LLMs still lack a crucial ability: communication skills. This limitation renders them more like information seeking tools rather than anthropomorphic chatbots. Communication skills, such as topic transition, proactively asking questions, concept guidance, empathy, and summarising often should be taken into consideration, to make LLMs more anthropomorphic and proactive during the conversation, thereby increasing the interest of users and attracting them to chat for longer. However, enabling these communication skills in black-box LLMs remains a key challenge because they do not have the same utterance formation mode as real people: think before speaking. Inspired by linguistics and cognitive science, we empower LLMs with communication skills through inner monologues. To evaluate various communication skills, we construct a benchmark named Cskills, which can also more comprehensively evaluate the dialogue generation ability of the model. Experimental results show that the proposed CSIM strategy improves the backbone models and outperforms the baselines.

Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue

TL;DR

The paper tackles the gap between large language models and human-like communication by introducing CSIM, a framework that couples inner monologue with five linguistically grounded skills to cultivate proactive, empathetic, and coherent dialogue. It formalizes each skill with definitional prompts and integrates them through prompt design and in-context learning, enabling LLMs to think before speaking. A new benchmark, Cskills, is built to evaluate these skills via self-chat and human-bot conversations, with both automatic and human evaluations. Across ChatGPT and Vicuna-13b, CSIM demonstrates consistent improvements over baselines and CoT in humanness, proactivity, engagement, and goal attainment, validating the approach's practical impact for open-domain dialogue systems.

Abstract

The emergence of large language models (LLMs) further improves the capabilities of open-domain dialogue systems and can generate fluent, coherent, and diverse responses. However, LLMs still lack a crucial ability: communication skills. This limitation renders them more like information seeking tools rather than anthropomorphic chatbots. Communication skills, such as topic transition, proactively asking questions, concept guidance, empathy, and summarising often should be taken into consideration, to make LLMs more anthropomorphic and proactive during the conversation, thereby increasing the interest of users and attracting them to chat for longer. However, enabling these communication skills in black-box LLMs remains a key challenge because they do not have the same utterance formation mode as real people: think before speaking. Inspired by linguistics and cognitive science, we empower LLMs with communication skills through inner monologues. To evaluate various communication skills, we construct a benchmark named Cskills, which can also more comprehensively evaluate the dialogue generation ability of the model. Experimental results show that the proposed CSIM strategy improves the backbone models and outperforms the baselines.
Paper Structure (38 sections, 5 figures, 23 tables)

This paper contains 38 sections, 5 figures, 23 tables.

Figures (5)

  • Figure 1: When asked to recommend: (a) ChatGPT directly recommends without asking the detailed needs of users, which may lead to failure to satisfy users; (b) people proactively ask questions to further understand the needs of users before making recommendations.
  • Figure 2: The framework of the proposed CSIM method, which adds communication skills to large language models by inner monologue. In-context learning is used to better implement the whole process.
  • Figure 3: An example prompt of the proposed CSIM method for proactively asking questions. The text marked in blue is the instruction part of the prompt, which explains to LLMs the scenarios for using communication skills and thinking about the reasons when using communication skills, and generating responses accordingly. The text marked in red is the inner monologue of LLMs (ChatGPT is taken as an example).
  • Figure 4: The result of implicit human evaluation.
  • Figure 5: Human evaluation results on each communication skill.