Table of Contents
Fetching ...

AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts

Tongshuang Wu, Michael Terry, Carrie J. Cai

TL;DR

The paper proposes Chaining, a method to decompose complex tasks into sub-tasks solved by sequential LLM runs, to improve transparency, controllability, and collaboration in human–AI interaction. It defines eight primitive operations, an interactive Chain interface, and a data-flow structure that exposes intermediate results for user editing and debugging. A within-subject user study with LaMDA shows significant gains in perceived transparency and control and higher-quality outputs (~82% success) when using Chains versus a Sandbox baseline. Case studies illustrate broader applicability to visualization debugging and assisted text entry, and the discussion outlines future directions for building more flexible, prototype-friendly LLM-based workflows. Overall, the work demonstrates that task decomposition and visible intermediate steps can unlock LLM latent capabilities and enable rapid prototyping of AI-assisted applications without retraining.

Abstract

Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack of transparency, and insufficient controllability can make them less effective when assisting humans on more complex tasks. In response, we introduce the concept of Chaining LLM steps together, where the output of one step becomes the input for the next, thus aggregating the gains per step. We first define a set of LLM primitive operations useful for Chain construction, then present an interactive system where users can modify these Chains, along with their intermediate results, in a modular way. In a 20-person user study, we found that Chaining not only improved the quality of task outcomes, but also significantly enhanced system transparency, controllability, and sense of collaboration. Additionally, we saw that users developed new ways of interacting with LLMs through Chains: they leveraged sub-tasks to calibrate model expectations, compared and contrasted alternative strategies by observing parallel downstream effects, and debugged unexpected model outputs by "unit-testing" sub-components of a Chain. In two case studies, we further explore how LLM Chains may be used in future applications

AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts

TL;DR

The paper proposes Chaining, a method to decompose complex tasks into sub-tasks solved by sequential LLM runs, to improve transparency, controllability, and collaboration in human–AI interaction. It defines eight primitive operations, an interactive Chain interface, and a data-flow structure that exposes intermediate results for user editing and debugging. A within-subject user study with LaMDA shows significant gains in perceived transparency and control and higher-quality outputs (~82% success) when using Chains versus a Sandbox baseline. Case studies illustrate broader applicability to visualization debugging and assisted text entry, and the discussion outlines future directions for building more flexible, prototype-friendly LLM-based workflows. Overall, the work demonstrates that task decomposition and visible intermediate steps can unlock LLM latent capabilities and enable rapid prototyping of AI-assisted applications without retraining.

Abstract

Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack of transparency, and insufficient controllability can make them less effective when assisting humans on more complex tasks. In response, we introduce the concept of Chaining LLM steps together, where the output of one step becomes the input for the next, thus aggregating the gains per step. We first define a set of LLM primitive operations useful for Chain construction, then present an interactive system where users can modify these Chains, along with their intermediate results, in a modular way. In a 20-person user study, we found that Chaining not only improved the quality of task outcomes, but also significantly enhanced system transparency, controllability, and sense of collaboration. Additionally, we saw that users developed new ways of interacting with LLMs through Chains: they leveraged sub-tasks to calibrate model expectations, compared and contrasted alternative strategies by observing parallel downstream effects, and debugged unexpected model outputs by "unit-testing" sub-components of a Chain. In two case studies, we further explore how LLM Chains may be used in future applications

Paper Structure

This paper contains 45 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: A walkthrough example illustrating the differences between no-Chaining (A) and Chaining (B), using the example task of writing a peer review to be more constructive. With a single call to the model in (A), even though the prompt (italicized) clearly describes the task, the $\bullet$generated paragraph remains mostly impersonal and does not provide concrete suggestions for all 3 of Alex's presentation problems. In (B), we instead use an LLM Chain with three steps, each for a distinct sub-task: (b1) A Split points step that extracts each individual presentation $\bullet$problem from the $\bullet$original feedback, (b2) An Ideation step that brainstorms $\bullet$suggestions per problem, and (b3) A Compose points step that synthesizes all the problems and suggestions into a final $\bullet$friendly paragraph. The result is noticeably improved.
  • Figure 2: An example of how to create an LLM step using a prompt template (A), using the Ideation step of the peer review writing scenario (from Figure \ref{['fig:flow_review']}) as an example. For the peer review scenario, the Ideation operation takes in a problem (e.g., too much text) as input, and produces suggestions for improvement as output, but the prompt template allows the Ideation operation to take in any custom inputs and outputs. The template includes placeholders for the input (prefix-1), output (prefix-2), and (optional) few-shot examples. (B) shows the actual prompt after filling in the placeholders in the prompt template.
  • Figure 3: An overview of the interface, reflecting the peer review rewriting example in Figure \ref{['fig:flow_review']}. It consists of (A) a Chain view that depicts the high level Chaining structure, and (B/C) a Step view that allows for refining and executing each LLM step. The interface facilitates tracking the progress of the LLM Chain. For example, when moving from step 2: Ideation (B) to step 3: Compose Points (C), the previously generated presentation problems and suggestions become inputs for the final paragraph. A demonstration is available at https://youtu.be/QFS-1EWlvMM.
  • Figure 4: The LLM Chain for flashcard creation, with: (A) An Ideation step that brainstorms the $\bullet$types of interactions that we might encounter when $\bullet$visiting a given city (Paris), (B) Another Ideation step that creates a list of $\bullet$English examples for each $\bullet$interaction type, and (C) A Rewriting step that translates each $\bullet$English example into $\bullet$French.
  • Figure 5: Participants' ratings in the form of seven-point Likert scale questions (details in Appendix \ref{['ssec:appendix-user-study-survey']}), with 95% confidence intervals. Using Chaining, participants felt they produced results that better matched the task goals, and that the system helped them think through the task. They also found Chaining more transparent, controllable, and collaborative.
  • ...and 6 more figures