Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation

Dipin Khati; Daniel Rodriguez-Cardenas; David N. Palacio; Alejandro Velasco; Denys Poshyvanyk

Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation

Dipin Khati, Daniel Rodriguez-Cardenas, David N. Palacio, Alejandro Velasco, Denys Poshyvanyk

TL;DR

Code$\mathbb{Q}$ introduces a global, code-based interpretability framework that transcends token-level explanations by mapping token rationales to high-level programming concepts and aggregating them into a global interpretability tensor $\Phi$. Through empirical analysis on code generation and test-case generation tasks, the framework reveals that LLMs rely heavily on shallow syntactic cues and exhibit adaptive, semantically evolving reasoning that often diverges from human developer thinking, with entropy dropping by more than 50% after concept-level aggregation. A user study with 37 participants indicates substantial usability and usefulness but also highlights misalignment between machine and human rationales, underscoring the need for human-centric trust calibration and improved visualization. The work demonstrates that global, developer-centered explanations can uncover systemic model behaviors invisible to traditional accuracy metrics and proposes practical directions, including a VS Code extension, to integrate Code$\mathbb{Q}$ into real-world software engineering workflows.

Abstract

As Large Language Models for Code (LM4Code) become integral to software engineering, establishing trust in their output becomes critical. However, standard accuracy metrics obscure the underlying reasoning of generative models, offering little insight into how decisions are made. Although post-hoc interpretability methods attempt to fill this gap, they often restrict explanations to local, token-level insights, which fail to provide a developer-understandable global analysis. Our work highlights the urgent need for \textbf{global, code-based} explanations that reveal how models reason across code. To support this vision, we introduce \textit{code rationales} (CodeQ), a framework that enables global interpretability by mapping token-level rationales to high-level programming categories. Aggregating thousands of these token-level explanations allows us to perform statistical analyses that expose systemic reasoning behaviors. We validate this aggregation by showing it distills a clear signal from noisy token data, reducing explanation uncertainty (Shannon entropy) by over 50%. Additionally, we find that a code generation model (\textit{codeparrot-small}) consistently favors shallow syntactic cues (e.g., \textbf{indentation}) over deeper semantic logic. Furthermore, in a user study with 37 participants, we find its reasoning is significantly misaligned with that of human developers. These findings, hidden from traditional metrics, demonstrate the importance of global interpretability techniques to foster trust in LM4Code.

Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation

TL;DR

Abstract

Enabling Global, Human-Centered Explanations for LLMs:From Tokens to Interpretable Code and Test Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)