Table of Contents
Fetching ...

InconLens: Interactive Visual Diagnosis of Behavioral Inconsistencies in LLM-based Agentic Systems

Shuo Yan, Xiaolin Wen, Shaolun Ruan, Yanjie Zhang, Jiaming Mi, Yushi Sun, Huamin Qu, Rui Sheng

Abstract

Large Language Model (LLM)-based agentic systems have shown growing promise in tackling complex, multi-step tasks through autonomous planning, reasoning, and interaction with external environments. However, the stochastic nature of LLM generation introduces intrinsic behavioral inconsistency: the same agent may succeed in one execution but fail in another under identical inputs. Diagnosing such inconsistencies remains a major challenge for developers, as agent execution logs are often lengthy, unstructured, and difficult to compare across runs. Existing debugging and evaluation tools primarily focus on inspecting single executions, offering limited support for understanding how and why agent behaviors diverge across repeated runs. To address this challenge, we introduce InconLens, a visual analytics system designed to support interactive diagnosis of LLM-based agentic systems with a particular focus on cross-run behavioral analysis. InconLens introduces information nodes as an intermediate abstraction that captures canonical informational milestones shared across executions, enabling semantic alignment and inspection of agent reasoning trajectories across multiple runs. We demonstrate the effectiveness of InconLens through a detailed case study and further validate its usability and analytical value via expert interviews. Our results show that InconLens enables developers to more efficiently identify divergence points, uncover latent failure modes, and gain actionable insights into improving the reliability and stability of agentic systems.

InconLens: Interactive Visual Diagnosis of Behavioral Inconsistencies in LLM-based Agentic Systems

Abstract

Large Language Model (LLM)-based agentic systems have shown growing promise in tackling complex, multi-step tasks through autonomous planning, reasoning, and interaction with external environments. However, the stochastic nature of LLM generation introduces intrinsic behavioral inconsistency: the same agent may succeed in one execution but fail in another under identical inputs. Diagnosing such inconsistencies remains a major challenge for developers, as agent execution logs are often lengthy, unstructured, and difficult to compare across runs. Existing debugging and evaluation tools primarily focus on inspecting single executions, offering limited support for understanding how and why agent behaviors diverge across repeated runs. To address this challenge, we introduce InconLens, a visual analytics system designed to support interactive diagnosis of LLM-based agentic systems with a particular focus on cross-run behavioral analysis. InconLens introduces information nodes as an intermediate abstraction that captures canonical informational milestones shared across executions, enabling semantic alignment and inspection of agent reasoning trajectories across multiple runs. We demonstrate the effectiveness of InconLens through a detailed case study and further validate its usability and analytical value via expert interviews. Our results show that InconLens enables developers to more efficiently identify divergence points, uncover latent failure modes, and gain actionable insights into improving the reliability and stability of agentic systems.

Paper Structure

This paper contains 21 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Four interaction architectures for LLM-based multi-agent systems.
  • Figure 2: Conceptual framework of InconLens illustrating the relationship between raw execution logs, information nodes, and the system interface. The upper left shows raw log data generated by agents. The upper right illustrates extracted information nodes, which represent a semantically meaningful task milestone abstracted from raw logs. The lower part depicts the system interface, which operationalizes this abstraction through four coordinated views.
  • Figure 3: Overview of the InconLens interface for diagnosing agent behavioral inconsistency across multiple runs. The system consists of four tightly coordinated views. (A) The Task Summary View provides a run-level overview of repeated executions. This view also supports the extraction, refinement, and dependency specification of information nodes. (B) The Information Node View visualizes the task as a sequence of information nodes connected by dependencies using a Sankey diagram, enabling users to identify divergent or failure-prone transitions. (C) The Action View supports detailed inspection of a selected node transition. (D) The Agent Log View presents the original execution logs in a chronological, step-by-step format.
  • Figure 4: (A) Developers can examine a summary of token consumption. (B) Developers can generate and refine information nodes. (C) Developers can further inspect and modify the dependencies among the final determined information nodes.
  • Figure 5: Sankey-based visualization of execution runs across information nodes, showing how tasks progress through milestones, the outcomes of transitions, and the relative frequency of different execution paths.
  • ...and 5 more figures