PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona

Jihyun Lee; Yejin Jeon; Seungyeon Seo; Gary Geunbae Lee

PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona

Jihyun Lee, Yejin Jeon, Seungyeon Seo, Gary Geunbae Lee

TL;DR

PicPersona-TOD advances task-oriented dialogue by introducing an image-based vision persona to personalize system responses. It delivers an automated five-stage data-generation pipeline that aligns user images with dialogue, transfers utterance style, and uses first-impression prompts plus retrieval-augmented knowledge from Google Maps and Wikipedia to reduce hallucinations. The authors present Pictor, a vision-language NLG baseline that demonstrates strong personalization and generalizes to unseen domains, while maintaining core TOD capabilities such as DST and policy inference. Human evaluations show enhanced user experience and personalization quality, underscoring the practical impact of multimodal personas for engaging TOD interactions, with careful filtering and ethical considerations. Overall, PicPersona-TOD enables more natural, context-aware TOD interactions and provides a solid foundation for future multimodal personalization research in dialogue systems.

Abstract

Task-Oriented Dialogue (TOD) systems are designed to fulfill user requests through natural language interactions, yet existing systems often produce generic, monotonic responses that lack individuality and fail to adapt to users' personal attributes. To address this, we introduce PicPersona-TOD, a novel dataset that incorporates user images as part of the persona, enabling personalized responses tailored to user-specific factors such as age or emotional context. This is facilitated by first impressions, dialogue policy-guided prompting, and the use of external knowledge to reduce hallucinations. Human evaluations confirm that our dataset enhances user experience, with personalized responses contributing to a more engaging interaction. Additionally, we introduce a new NLG model, Pictor, which not only personalizes responses, but also demonstrates robust performance across unseen domains https://github.com/JihyunLee1/PicPersona.

PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona

TL;DR

Abstract

PicPersona-TOD : A Dataset for Personalizing Utterance Style in Task-Oriented Dialogue with Image Persona

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (18)