Table of Contents
Fetching ...

GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing

Jinhao Duan, Xinyu Zhao, Zhuoxuan Zhang, Eunhye Ko, Lily Boddy, Chenan Wang, Tianhao Li, Alexander Rasgon, Junyuan Hong, Min Kyung Lee, Chenxi Yuan, Qi Long, Ying Ding, Tianlong Chen, Kaidi Xu

TL;DR

This work introduces GuideLLM, an LLM-guided conversation framework for autobiography interviewing that integrates Goal Navigation, Context Management, and Empathetic Engagement. It details a 23-session interviewing protocol (VIP) augmented by a Memory Graph Extrapolation (MGE) module and emotion-aware interaction, enabling autonomous and adaptive dialogue. The authors evaluate GuideLLM using both automatic metrics (interviewing/communication quality, and autobiography quality) and human-subject studies, reporting significant advantages over several baselines and consistent positive reception from participants. Overall, GuideLLM demonstrates the viability and benefits of autonomous LLM-guided interviews and memory-aware autobiography generation, with practical implications for interactive storytelling and memory-based dialogue systems.

Abstract

Although Large Language Models (LLMs) succeed in human-guided conversations such as instruction following and question answering, the potential of LLM-guided conversations-where LLMs direct the discourse and steer the conversation's objectives-remains under-explored. In this study, we first characterize LLM-guided conversation into three fundamental components: (i) Goal Navigation; (ii) Context Management; (iii) Empathetic Engagement, and propose GuideLLM as an installation. We then implement an interviewing environment for the evaluation of LLM-guided conversation. Specifically, various topics are involved in this environment for comprehensive interviewing evaluation, resulting in around 1.4k turns of utterances, 184k tokens, and over 200 events mentioned during the interviewing for each chatbot evaluation. We compare GuideLLM with 6 state-of-the-art LLMs such as GPT-4o and Llama-3-70b-Instruct, from the perspective of interviewing quality, and autobiography generation quality. For automatic evaluation, we derive user proxies from multiple autobiographies and employ LLM-as-a-judge to score LLM behaviors. We further conduct a human-involved experiment by employing 45 human participants to chat with GuideLLM and baselines. We then collect human feedback, preferences, and ratings regarding the qualities of conversation and autobiography. Experimental results indicate that GuideLLM significantly outperforms baseline LLMs in automatic evaluation and achieves consistent leading performances in human ratings.

GuideLLM: Exploring LLM-Guided Conversation with Applications in Autobiography Interviewing

TL;DR

This work introduces GuideLLM, an LLM-guided conversation framework for autobiography interviewing that integrates Goal Navigation, Context Management, and Empathetic Engagement. It details a 23-session interviewing protocol (VIP) augmented by a Memory Graph Extrapolation (MGE) module and emotion-aware interaction, enabling autonomous and adaptive dialogue. The authors evaluate GuideLLM using both automatic metrics (interviewing/communication quality, and autobiography quality) and human-subject studies, reporting significant advantages over several baselines and consistent positive reception from participants. Overall, GuideLLM demonstrates the viability and benefits of autonomous LLM-guided interviews and memory-aware autobiography generation, with practical implications for interactive storytelling and memory-based dialogue systems.

Abstract

Although Large Language Models (LLMs) succeed in human-guided conversations such as instruction following and question answering, the potential of LLM-guided conversations-where LLMs direct the discourse and steer the conversation's objectives-remains under-explored. In this study, we first characterize LLM-guided conversation into three fundamental components: (i) Goal Navigation; (ii) Context Management; (iii) Empathetic Engagement, and propose GuideLLM as an installation. We then implement an interviewing environment for the evaluation of LLM-guided conversation. Specifically, various topics are involved in this environment for comprehensive interviewing evaluation, resulting in around 1.4k turns of utterances, 184k tokens, and over 200 events mentioned during the interviewing for each chatbot evaluation. We compare GuideLLM with 6 state-of-the-art LLMs such as GPT-4o and Llama-3-70b-Instruct, from the perspective of interviewing quality, and autobiography generation quality. For automatic evaluation, we derive user proxies from multiple autobiographies and employ LLM-as-a-judge to score LLM behaviors. We further conduct a human-involved experiment by employing 45 human participants to chat with GuideLLM and baselines. We then collect human feedback, preferences, and ratings regarding the qualities of conversation and autobiography. Experimental results indicate that GuideLLM significantly outperforms baseline LLMs in automatic evaluation and achieves consistent leading performances in human ratings.

Paper Structure

This paper contains 45 sections, 2 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Comparison between human-guided conversation and LLM-guided conversation. (a) Human-Guided: Human dominates the conversation, providing feedback and instruction to LLMs. (b) LLM-Guided: LLMs navigate the goal by automatically extrapolating interview questions.
  • Figure 2: The overall pipeline of GuideLLM in the guided conversation environment.
  • Figure 3: Ablation study of GuideLLM: how empathetic engagement affects users' (a) positive and (b) negative emotional distributions, (c) statistical results on the number of valid conversation rounds, and (d) the benefits of the MGE in goal navigation.
  • Figure 4: The demographics of participants on (a) age, (b) gender, (c) race, (d) AI familiarity, and (e) AI usage.
  • Figure 5: The Win Rates (WR) of human evaluation.
  • ...and 2 more figures