KnowThyself: An Agentic Assistant for LLM Interpretability

Suraj Prasai; Mengnan Du; Ying Zhang; Fan Yang

KnowThyself: An Agentic Assistant for LLM Interpretability

Suraj Prasai, Mengnan Du, Ying Zhang, Fan Yang

TL;DR

KnowThyself addresses the need for accessible LLM interpretability by unifying fragmented tools into a conversational, multi-agent platform. The approach introduces an orchestrator LLM, an agent router, and modular specialized agents that can generate interactive visualizations and literature-grounded explanations. By enabling model upload, natural-language querying, and end-to-end explanations without coding, it lowers technical barriers for practitioners. The work contributes a scalable, extensible framework with potential to accelerate practical interpretability adoption in real-world workflows.

Abstract

We develop KnowThyself, an agentic assistant that advances large language model (LLM) interpretability. Existing tools provide useful insights but remain fragmented and code-intensive. KnowThyself consolidates these capabilities into a chat-based interface, where users can upload models, pose natural language questions, and obtain interactive visualizations with guided explanations. At its core, an orchestrator LLM first reformulates user queries, an agent router further directs them to specialized modules, and the outputs are finally contextualized into coherent explanations. This design lowers technical barriers and provides an extensible platform for LLM inspection. By embedding the whole process into a conversational workflow, KnowThyself offers a robust foundation for accessible LLM interpretability.

KnowThyself: An Agentic Assistant for LLM Interpretability

TL;DR

Abstract

KnowThyself: An Agentic Assistant for LLM Interpretability

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)