Table of Contents
Fetching ...

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

Sam Yu-Te Lee, Chenyang Ji, Shicheng Wen, Lifu Huang, Dongyu Liu, Kwan-Liu Ma

TL;DR

VIDEE addresses the barrier to entry in text analytics by enabling entry-level analysts to perform advanced text analytics through a no-code, three-stage human-AI workflow. It combines a decomposer that uses Monte-Carlo Tree Search with LLM judges, an executor that assembles executable pipelines, and an evaluator that relies on LLM judgments and visualizations to validate results. The paper provides a detailed interface design, backend specification, and quantitative/user studies showing VIDEE's usability, reliability, and potential to democratize text analytics. Overall, VIDEE demonstrates how structured human-AI collaboration can yield robust, interpretable text analytics pipelines while highlighting opportunities to improve agent reliability and evaluator methods.

Abstract

Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabling more accessible and automated text analysis (e.g., topic detection, summarization, information extraction, etc.). We introduce VIDEE, a system that supports entry-level data analysts to conduct advanced text analytics with intelligent agents. VIDEE instantiates a human-agent collaroration workflow consisting of three stages: (1) Decomposition, which incorporates a human-in-the-loop Monte-Carlo Tree Search algorithm to support generative reasoning with human feedback, (2) Execution, which generates an executable text analytics pipeline, and (3) Evaluation, which integrates LLM-based evaluation and visualizations to support user validation of execution results. We conduct two quantitative experiments to evaluate VIDEE's effectiveness and analyze common agent errors. A user study involving participants with varying levels of NLP and text analytics experience -- from none to expert -- demonstrates the system's usability and reveals distinct user behavior patterns. The findings identify design implications for human-agent collaboration, validate the practical utility of VIDEE for non-expert users, and inform future improvements to intelligent text analytics systems.

VIDEE: Visual and Interactive Decomposition, Execution, and Evaluation of Text Analytics with Intelligent Agents

TL;DR

VIDEE addresses the barrier to entry in text analytics by enabling entry-level analysts to perform advanced text analytics through a no-code, three-stage human-AI workflow. It combines a decomposer that uses Monte-Carlo Tree Search with LLM judges, an executor that assembles executable pipelines, and an evaluator that relies on LLM judgments and visualizations to validate results. The paper provides a detailed interface design, backend specification, and quantitative/user studies showing VIDEE's usability, reliability, and potential to democratize text analytics. Overall, VIDEE demonstrates how structured human-AI collaboration can yield robust, interpretable text analytics pipelines while highlighting opportunities to improve agent reliability and evaluator methods.

Abstract

Text analytics has traditionally required specialized knowledge in Natural Language Processing (NLP) or text analysis, which presents a barrier for entry-level analysts. Recent advances in large language models (LLMs) have changed the landscape of NLP by enabling more accessible and automated text analysis (e.g., topic detection, summarization, information extraction, etc.). We introduce VIDEE, a system that supports entry-level data analysts to conduct advanced text analytics with intelligent agents. VIDEE instantiates a human-agent collaroration workflow consisting of three stages: (1) Decomposition, which incorporates a human-in-the-loop Monte-Carlo Tree Search algorithm to support generative reasoning with human feedback, (2) Execution, which generates an executable text analytics pipeline, and (3) Evaluation, which integrates LLM-based evaluation and visualizations to support user validation of execution results. We conduct two quantitative experiments to evaluate VIDEE's effectiveness and analyze common agent errors. A user study involving participants with varying levels of NLP and text analytics experience -- from none to expert -- demonstrates the system's usability and reveals distinct user behavior patterns. The findings identify design implications for human-agent collaboration, validate the practical utility of VIDEE for non-expert users, and inform future improvements to intelligent text analytics systems.

Paper Structure

This paper contains 100 sections, 8 figures.

Figures (8)

  • Figure 1: A three-stage human-agent collaboration workflow for text analytics with multiple agents. In the Decomposition stage, the human describes a goal and the decomposer agent searches for a plan under human monitor and control. The results are communicated back as semantic tasks (e.g., topic modeling). In the Execution stage, the executor agent generates a pipeline based on a plan specified in primitive tasks (e.g., cluster analysis or document classification) for the human to execute and inspect. In the Evaluation stage, the evaluator agent runs evaluation (e.g., topic coverage) using the human specified criteria and present the results for the human to evaluate and verify.
  • Figure 2: The interface for the decomposition stage. (a) Users can input their goal and dataset context in natural language. (b) The decomposer agent iteratively searches for text analytics plans to accomplish the goal using Monte-Carlo Tree Serch. Users can make various actions to intervene in the search process, such as choosing next expansion node or adjusting the scoring of a node. (c) Dataset Inspection View supports users to make sense of the dataset. (d) The scoring criteria of each node, including complexity, coherence, and importance. User feedback on the scoring is recorded by the system.
  • Figure 3: The interface for the execution and evaluation stage. (a) The user selected plan in the decomposition stage. Users can make final adjustments here, then click "Convert". (b) Based on the plan, the system generates an executable pipeline. Each node is a primitive task with label, description, and execution parameters. Users can click the "Execute" button to execute a node. (c) For each primitive task that needs evaluation, the system automatically recommends three evaluation criteria using LLM judges. Users can also add their own evaluation criterion using the "+" button on the top right corner. Each evaluation node is also executable. (d) The inspection panel showing the detail of a selected node. Users can see more detail of a node or make necessary changes, such as changing the input/output, execution parameters, or prompt templates. (e) Users can inspect the execution results in the interface.
  • Figure 4: Demonstration of Data Inspection View using a HCI paper abstract dataset. Left: a simple list view showing the raw content of documents. Right: the topic radial chart plotting the topic distribution of the documents. Each node is a document and each fan-shape region is a topic, such as "Shape-Changing Interfaces and Interaction Techniques". Clicking a node in the topic radial chart will highlight the corresponding document in the list view. The topic radial chart faciliates the sensemaking process of text analytics.
  • Figure 5: Visualizations for the evaluation results of "Sumamry Length Evaluator". Left: an overview of the distribution of categories in a bar chart. Each bar is a category generated by the LLM judges, and the height encodes how many units (e.g., documents) are assigned this category (Long or Short). Right: The topic radial chart extended to encode the categories. Each topic region is further subdivided by the categories. The inner region represents "Long" documents, and the outer region represents "Short" documents. The visualization ensures visual continuity between sensemaking of the dataset and evaluation of execution results.
  • ...and 3 more figures