Table of Contents
Fetching ...

Model Behavior Specification by Leveraging LLM Self-Playing and Self-Improving

Soya Park, J. D. Zamfirescu-Pereira, Chinmay Kulkarni

TL;DR

The paper addresses the challenge of articulating precise behavioral instructions for AI agents in open-ended domains. It introduces Visionary Tuning, a two-phase approach combining self-playing by the target model with human-in-the-loop feedback and automated self-improvement to refine instructions that govern behavior, including avoidance of anti-behaviors. Through a within-subject user study (N=12) and a larger crowd-study (N=60), VisionForge enhanced domain exploration and anti-behavior avoidance without compromising response quality, though users did not always perceive clear benefits. A technical evaluation with real-world movie critic data demonstrates robust, low-variance performance across classes with limited training data. Overall, the work argues for a human-in-the-loop, self-guided refinement paradigm that improves prompt reliability and transparency, with implications for safer, more accountable interactive AI systems.

Abstract

Training AI models is challenging, particularly when crafting behavior instructions. Traditional methods rely on machines (supervised learning) or manual pattern discovery, which results in not interpretable models or time sink. While Large Language Models (LLMs) simplify instruction writing through natural language, articulating intended model behavior still remains difficult. We introduce Visionary Tuning, a human-in-the-loop self-playing followed by automatic self-refinement to improve behavior specification. Our system helps users clarify desired behavior through self-playing and generates prompts through self-improving, Our first evaluation involves user study conducted on a system implementation of Visionary Tuning within the context of chatbot behavior. Our system self-play itself by simulating user interactions to identify patterns and create effective prompts based on the pattern. In a within-subject study (N=12), participants pinpointed more patterns through self-playing and crafted better prompts. Surprisingly, users felt more or less success level in specifying the model behavior. Follow-up crowd studies (N=60) confirmed that the chatbot adhered to instructions without sacrificing quality. Our second evaluation is a case study on a real-world implementation using a movie rating dataset with Visionary Tuning, demonstrating its effectiveness and robustness in modeling a critic's preferences across the spectrum of low to highly rated movies. Together, these results suggest how AI improves the design process of interactive AI systems. Furthermore, they suggest how the benefits of these tools may be non-obvious to end-users. We reflect on these findings and suggest future directions.

Model Behavior Specification by Leveraging LLM Self-Playing and Self-Improving

TL;DR

The paper addresses the challenge of articulating precise behavioral instructions for AI agents in open-ended domains. It introduces Visionary Tuning, a two-phase approach combining self-playing by the target model with human-in-the-loop feedback and automated self-improvement to refine instructions that govern behavior, including avoidance of anti-behaviors. Through a within-subject user study (N=12) and a larger crowd-study (N=60), VisionForge enhanced domain exploration and anti-behavior avoidance without compromising response quality, though users did not always perceive clear benefits. A technical evaluation with real-world movie critic data demonstrates robust, low-variance performance across classes with limited training data. Overall, the work argues for a human-in-the-loop, self-guided refinement paradigm that improves prompt reliability and transparency, with implications for safer, more accountable interactive AI systems.

Abstract

Training AI models is challenging, particularly when crafting behavior instructions. Traditional methods rely on machines (supervised learning) or manual pattern discovery, which results in not interpretable models or time sink. While Large Language Models (LLMs) simplify instruction writing through natural language, articulating intended model behavior still remains difficult. We introduce Visionary Tuning, a human-in-the-loop self-playing followed by automatic self-refinement to improve behavior specification. Our system helps users clarify desired behavior through self-playing and generates prompts through self-improving, Our first evaluation involves user study conducted on a system implementation of Visionary Tuning within the context of chatbot behavior. Our system self-play itself by simulating user interactions to identify patterns and create effective prompts based on the pattern. In a within-subject study (N=12), participants pinpointed more patterns through self-playing and crafted better prompts. Surprisingly, users felt more or less success level in specifying the model behavior. Follow-up crowd studies (N=60) confirmed that the chatbot adhered to instructions without sacrificing quality. Our second evaluation is a case study on a real-world implementation using a movie rating dataset with Visionary Tuning, demonstrating its effectiveness and robustness in modeling a critic's preferences across the spectrum of low to highly rated movies. Together, these results suggest how AI improves the design process of interactive AI systems. Furthermore, they suggest how the benefits of these tools may be non-obvious to end-users. We reflect on these findings and suggest future directions.

Paper Structure

This paper contains 48 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: VisionForge interface (In this interface, anti-behaviors are called "taboos" to keep the terminology consistent with our experimental setup.) (Left) Chatbot developers share initial prompt and anti-behavior. VisionForge then proposes an LLM prompt of user-simulated chatbots. (Right) Developers can then see conversations between their chatbot and user-simulated chatbots. During simulated conversations, when anti-behavior occurs, VisionForge flags the conversations.
  • Figure 2: VisionForge provides feedback and makes suggestions in different stages of prompt engineering: (a) (Top) When anti-behavior is aligned with the LLM guideline, users can move onto the next step (Bottom) When not aligned, VisionForge suggests an alternative anti-behavior. (b) VisionForge asks users to provide a simulated user of the chatbot and also makes a suggestion as well.
  • Figure 3: Interface for a control version: To avoid response bias, we seek to develop the control version in a similar aesthetic as the experiment version of the interface, thereby making it not obvious to tell which version is control or experiment.