Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting

Saksorn Ruangtanusak; Pittawat Taveekitworachai; Kunat Pipatanakul

Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting

Saksorn Ruangtanusak, Pittawat Taveekitworachai, Kunat Pipatanakul

TL;DR

The paper addresses the challenge of building persona-grounded, tool-augmented dialogue agents for CPDC 2025 by evaluating four prompting strategies to curb over-speaking and under-acting. It identifies Rule-Based Role Prompting (RRP), which combines Character-Card/Scene-Contract design and Hard-Enforced Function Calling, as the strongest approach, achieving an overall score of 0.571 (0.531 Task 1, 0.611 Task 2). Call-level analysis reveals a notable gap between high partial accuracy for function names (0.714) and low exact argument grounding (0.231), highlighting the need for stronger argument fidelity. The study demonstrates that constraint-driven prompting can substantially improve reliability in persona-driven tool use and provides open-source prompts and APO tooling to support future research.

Abstract

This report investigates approaches for prompting a tool-augmented large language model (LLM) to act as a role-playing dialogue agent in the API track of the Commonsense Persona-grounded Dialogue Challenge (CPDC) 2025. In this setting, dialogue agents often produce overly long in-character responses (over-speaking) while failing to use tools effectively according to the persona (under-acting), such as generating function calls that do not exist or making unnecessary tool calls before answering. We explore four prompting approaches to address these issues: 1) basic role prompting, 2) improved role prompting, 3) automatic prompt optimization (APO), and 4) rule-based role prompting. The rule-based role prompting (RRP) approach achieved the best performance through two novel techniques-character-card/scene-contract design and strict enforcement of function calling-which led to an overall score of 0.571, improving on the zero-shot baseline score of 0.519. These findings demonstrate that RRP design can substantially improve the effectiveness and reliability of role-playing dialogue agents compared with more elaborate methods such as APO. To support future efforts in developing persona prompts, we are open-sourcing all of our best-performing prompts and the APO tool Source code is available at https://github.com/scb-10x/apo

Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting

TL;DR

Abstract

Talk Less, Call Right: Enhancing Role-Play LLM Agents with Automatic Prompt Optimization and Role Prompting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)