Table of Contents
Fetching ...

Botender: Supporting Communities in Collaboratively Designing AI Agents through Case-Based Provocations

Tzu-Sheng Kuo, Sophia Liu, Quan Ze Chen, Joseph Seering, Amy X. Zhang, Haiyi Zhu, Kenneth Holstein

TL;DR

Botender enables communities to co-create LLM-powered bots through a collaborative workflow that centers case-based provocations to surface disagreements and opportunities for improvement. The system integrates with Discord, supports proposing, iterating on prompts, and deploying updates, and relies on three LLM pipelines to generate provocative, context-rich test cases. A validation study shows that Botender’s provocations uncover more potential improvements and disagreements than baseline test cases, while a five-day field study demonstrates practical, culture-aligned bot design across six real communities. Together, these findings highlight how participatory design, integrated tooling, and provocative case generation can expand community governance of AI-backed infrastructures while pointing to future work on scalability, multi-turn agent design, and power dynamics.

Abstract

AI agents, or bots, serve important roles in online communities. However, they are often designed by outsiders or a few tech-savvy members, leading to bots that may not align with the broader community's needs. How might communities collectively shape the behavior of community bots? We present Botender, a system that enables communities to collaboratively design LLM-powered bots without coding. With Botender, community members can directly propose, iterate on, and deploy custom bot behaviors tailored to community needs. Botender facilitates testing and iteration on bot behavior through case-based provocations: interaction scenarios generated to spark user reflection and discussion around desirable bot behavior. A validation study found these provocations more useful than standard test cases for revealing improvement opportunities and surfacing disagreements. During a five-day deployment across six Discord servers, Botender supported communities in tailoring bot behavior to their specific needs, showcasing the usefulness of case-based provocations in facilitating collaborative bot design.

Botender: Supporting Communities in Collaboratively Designing AI Agents through Case-Based Provocations

TL;DR

Botender enables communities to co-create LLM-powered bots through a collaborative workflow that centers case-based provocations to surface disagreements and opportunities for improvement. The system integrates with Discord, supports proposing, iterating on prompts, and deploying updates, and relies on three LLM pipelines to generate provocative, context-rich test cases. A validation study shows that Botender’s provocations uncover more potential improvements and disagreements than baseline test cases, while a five-day field study demonstrates practical, culture-aligned bot design across six real communities. Together, these findings highlight how participatory design, integrated tooling, and provocative case generation can expand community governance of AI-backed infrastructures while pointing to future work on scalability, multi-turn agent design, and power dynamics.

Abstract

AI agents, or bots, serve important roles in online communities. However, they are often designed by outsiders or a few tech-savvy members, leading to bots that may not align with the broader community's needs. How might communities collectively shape the behavior of community bots? We present Botender, a system that enables communities to collaboratively design LLM-powered bots without coding. With Botender, community members can directly propose, iterate on, and deploy custom bot behaviors tailored to community needs. Botender facilitates testing and iteration on bot behavior through case-based provocations: interaction scenarios generated to spark user reflection and discussion around desirable bot behavior. A validation study found these provocations more useful than standard test cases for revealing improvement opportunities and surfacing disagreements. During a five-day deployment across six Discord servers, Botender supported communities in tailoring bot behavior to their specific needs, showcasing the usefulness of case-based provocations in facilitating collaborative bot design.

Paper Structure

This paper contains 109 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Botender's proposal page. The left navigation bar lets users switch between viewing all active tasks on their Discord servers, community proposals for desired changes, or experimenting with the bot in the playground without affecting their server. In the center, users see the proposal’s title, description, and the latest proposed edits to the bot’s tasks, such as adding a new task in this screenshot. Users can upvote or downvote to indicate their support for or opposition to deploying the latest edit. The bottom displays a full edit history, allowing users to compare edits with previous versions and the original task. On the right, test cases help guide collaborative decision-making. At the bottom are test cases automatically generated to provoke user reflection and discussion around the latest edit. Generated cases are saved if a user chooses to vote on the bot's response. At the top, members can review and vote on test cases that have previously been saved by community members. Clicking a test case opens a pop-up with case details, including how the bot’s responses for that case have changed across edits. Finally, users can click “enter other cases manually” to open a sheet where they can add custom test cases.
  • Figure 2: After clicking the edit button on the proposal page, the original static text is replaced by this edit interface. Before saving edits, users are required to run "Test + Generate" to see how the bot would behave with their proposed edits.
  • Figure 3: The Discord interface, highlighting Botender’s integration with the community platform. (1) By default, Botender replies "hello" to users who greet it in the #botender channel, as defined by its default "Hello Botender" task. (2) When a new proposal is created, the system sends a notification to the #botender channel, and (3) creates an associated discussion thread, as shown on the right, where users can discuss the proposal and receive notifications about saved edit updates. (4) Once a proposal is deployed, the system notifies the discussion thread, closes the proposal, and (5) and sends a message to the main #botender channel. (6) The bot will then behave according to the latest deployed edit.
  • Figure 4: Botender’s case-based provocation algorithm uses three parallel LLM pipelines to generate provocative test cases that encourage user reflection and discussion on common prompt design pitfalls, including ambiguous language, overly narrow phrasing, or unintended downstream consequences for the community. Each pipeline includes its own detector, generator, and evaluator to generate relevant cases. Finally, a selector chooses the most provocative cases from all case candidates. The prompts for all ten LLM modules, including each pipeline's detector, generator, and evaluator, as well as the final selector, are provided in Appendix \ref{['ch6:sec:botender_appendix:algorithm']}.
  • Figure 5: Botender's overall system and agent architecture. On the left, platform events are captured by Botender's always-running event listener, which translates them into information the agent architecture can understand. The orchestrator agent assesses each event and determines which, if any, task-specific agent is most relevant. The selected task-specific agent then generates an action instruction that is executed by Botender's platform action executor. On the right, Botender's website serves as the primary interface for users to collaboratively and iteratively design AI agents. This process generates concrete interaction scenarios that help guide further design iterations. The website is tightly integrated with the community platform, with each proposal directly linked to a dedicated discussion thread, encouraging broader community participation and discussion.
  • ...and 3 more figures