Editable XAI: Toward Bidirectional Human-AI Alignment with Co-Editable Explanations of Interpretable Attributes

Haoyang Chen; Jingwen Bai; Fang Tian; Brian Y Lim

Editable XAI: Toward Bidirectional Human-AI Alignment with Co-Editable Explanations of Interpretable Attributes

Haoyang Chen, Jingwen Bai, Fang Tian, Brian Y Lim

TL;DR

Editable XAI introduces a bidirectional framework for aligning human domain knowledge with AI reasoning by making explanations editable. CoExplain pairs a neural predictor with a faith-proxy decision-tree explainer and supports writing user rules that can be parsed into neural networks, as well as AI-assisted enhancements that adjust thresholds or restructure topology. Across a user study (N=43), editable explanations improved user-AI faithfulness and understanding relative to read-only explanations, with CoExplain achieving near-optimal accuracy while maintaining alignment to user rules and reducing editing effort. The work demonstrates the value of writable explanations for collaborative human–AI reasoning and outlines design guidelines, limitations, and future directions for scalable, modular editable AI systems.

Abstract

While Explainable AI (XAI) helps users understand AI decisions, misalignment in domain knowledge can lead to disagreement. This inconsistency hinders understanding, and because explanations are often read-only, users lack the control to improve alignment. We propose making XAI editable, allowing users to write rules to improve control and gain deeper understanding through the generation effect of active learning. We developed CoExplain, leveraging a neural network for universal representation and symbolic rules for intuitive reasoning on interpretable attributes. CoExplain explains the neural network with a faithful proxy decision tree, parses user-written rules as an equivalent neural network graph, and collaboratively optimizes the decision tree. In a user study (N=43), CoExplain and manually editable XAI improved user understanding and model alignment compared to read-only XAI. CoExplain was easier to use with fewer edits and less time. This work contributes Editable XAI for bidirectional AI alignment, improving understanding and control.

Editable XAI: Toward Bidirectional Human-AI Alignment with Co-Editable Explanations of Interpretable Attributes

TL;DR

Abstract

Paper Structure (94 sections, 6 equations, 33 figures, 2 tables)

This paper contains 94 sections, 6 equations, 33 figures, 2 tables.

Introduction
Related Work
Usage of Explainable Artificial Intelligence
Interactive Machine Learning for Human–AI Alignment
Integrating Human Knowledge in Neural Networks with Neuro-Symbolic Learning
End-User Development for Editable Systems
Elicitation User Study
Method
Probe apparatus and user task
Participants
Study procedure
Findings
Preference for human-written rules over AI learned rules
Capped performance despite iterative edits
Difficulty in determining threshold values and need for external advice.
...and 79 more sections

Figures (33)

Figure 1: The three interaction modes for Editable XAI. (1) Read: users inspect explanations generated by the AI, (2) Write: users modify explanations to guide the AI, and (3) Enhance: users and the AI collaboratively refine explanations. Creating a bidirectional alignment between the user and AI.
Figure 2: Overview of CoExplain’s three interaction modes and their underlying mechanisms. Read: explanations are distilled from the neural network $M$ into a decision tree $T$, using the input $x$ and network prediction $\hat{y}$ to make sure tree prediction $\tilde{y}$ aligns with $M$. Write: user-authored rules $T'$ are transformed into a neural network $M$ through a parser $\mathcal{P}$. Enhance: user collaborates with AI edits on the thresholds and topology. AI refines the rules through training with regularization, aligning predictions $\hat{y}$ with both user-defined rules' prediction $\tilde{y}'$ and data-driven adjustments with $y$, the topology of the explanation $T$ is aligned with $T'$ using a proxy model $F_d$ mapping the network parameters $\theta$ to their Tree Edit Distance $d$ calculated by $\mathcal{D}$.
Figure 3: Transforming a decision tree into a topologically equivalent neural network via a parser $\mathcal{P}$. a) A decision tree with two internal nodes testing feature thresholds. b) The corresponding neural network, where each tree node is encoded by a pair of first-layer neurons with biases $\pm \tau_i$ representing the threshold test. Subsequent layers mirror the tree’s decision paths: connections are preserved with unit weights, and neuron biases encode the logical relationships. Colored connections trace decision paths, the output layer aggregates signals to reproduce the tree’s leaf predictions.
Figure 4: Two types of enhancement from CoExplain, a) Threshold Update, only the threshold is updated, while keeping the topology still. b) Topology Update, both the thresholds and connections are trained, we utilize additional neurons and connections to support decision tree extension. Backpropagation and parameter update are marked as red.
Figure 5: Interface of CoExplain. a) Data attributes. b) User’s explanation rules canvas. c) User rule performance metrics. d) Enhancement actions. e) AI-enhanced explanation rules canvas. f) Enhancement constraints and edit history. g) Enhanced rule performance metrics. h) Simulation on test dataset.
...and 28 more figures

Editable XAI: Toward Bidirectional Human-AI Alignment with Co-Editable Explanations of Interpretable Attributes

TL;DR

Abstract

Editable XAI: Toward Bidirectional Human-AI Alignment with Co-Editable Explanations of Interpretable Attributes

Authors

TL;DR

Abstract

Table of Contents

Figures (33)