Model Behavior Specification by Leveraging LLM Self-Playing and Self-Improving

Soya Park; J. D. Zamfirescu-Pereira; Chinmay Kulkarni

Model Behavior Specification by Leveraging LLM Self-Playing and Self-Improving

Soya Park, J. D. Zamfirescu-Pereira, Chinmay Kulkarni

TL;DR

The paper addresses the challenge of articulating precise behavioral instructions for AI agents in open-ended domains. It introduces Visionary Tuning, a two-phase approach combining self-playing by the target model with human-in-the-loop feedback and automated self-improvement to refine instructions that govern behavior, including avoidance of anti-behaviors. Through a within-subject user study (N=12) and a larger crowd-study (N=60), VisionForge enhanced domain exploration and anti-behavior avoidance without compromising response quality, though users did not always perceive clear benefits. A technical evaluation with real-world movie critic data demonstrates robust, low-variance performance across classes with limited training data. Overall, the work argues for a human-in-the-loop, self-guided refinement paradigm that improves prompt reliability and transparency, with implications for safer, more accountable interactive AI systems.

Abstract

Training AI models is challenging, particularly when crafting behavior instructions. Traditional methods rely on machines (supervised learning) or manual pattern discovery, which results in not interpretable models or time sink. While Large Language Models (LLMs) simplify instruction writing through natural language, articulating intended model behavior still remains difficult. We introduce Visionary Tuning, a human-in-the-loop self-playing followed by automatic self-refinement to improve behavior specification. Our system helps users clarify desired behavior through self-playing and generates prompts through self-improving, Our first evaluation involves user study conducted on a system implementation of Visionary Tuning within the context of chatbot behavior. Our system self-play itself by simulating user interactions to identify patterns and create effective prompts based on the pattern. In a within-subject study (N=12), participants pinpointed more patterns through self-playing and crafted better prompts. Surprisingly, users felt more or less success level in specifying the model behavior. Follow-up crowd studies (N=60) confirmed that the chatbot adhered to instructions without sacrificing quality. Our second evaluation is a case study on a real-world implementation using a movie rating dataset with Visionary Tuning, demonstrating its effectiveness and robustness in modeling a critic's preferences across the spectrum of low to highly rated movies. Together, these results suggest how AI improves the design process of interactive AI systems. Furthermore, they suggest how the benefits of these tools may be non-obvious to end-users. We reflect on these findings and suggest future directions.

Model Behavior Specification by Leveraging LLM Self-Playing and Self-Improving

TL;DR

Abstract

Model Behavior Specification by Leveraging LLM Self-Playing and Self-Improving

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)