Table of Contents
Fetching ...

Argumentation for Explainable and Globally Contestable Decision Support with LLMs

Adam Dejl, Matthew Williams, Francesca Toni

Abstract

Large language models (LLMs) exhibit strong general capabilities, but their deployment in high-stakes domains is hindered by their opacity and unpredictability. Recent work has taken meaningful steps towards addressing these issues by augmenting LLMs with post-hoc reasoning based on computational argumentation, providing faithful explanations and enabling users to contest incorrect decisions. However, this paradigm is limited to pre-defined binary choices and only supports local contestation for specific instances, leaving the underlying decision logic unchanged and prone to repeated mistakes. In this paper, we introduce ArgEval, a framework that shifts from instance-specific reasoning to structured evaluation of general decision options. Rather than mining arguments solely for individual cases, ArgEval systematically maps task-specific decision spaces, builds corresponding option ontologies, and constructs general argumentation frameworks (AFs) for each option. These frameworks can then be instantiated to provide explainable recommendations for specific cases while still supporting global contestability through modification of the shared AFs. We investigate the effectiveness of ArgEval on treatment recommendation for glioblastoma, an aggressive brain tumour, and show that it can produce explainable guidance aligned with clinical practice.

Argumentation for Explainable and Globally Contestable Decision Support with LLMs

Abstract

Large language models (LLMs) exhibit strong general capabilities, but their deployment in high-stakes domains is hindered by their opacity and unpredictability. Recent work has taken meaningful steps towards addressing these issues by augmenting LLMs with post-hoc reasoning based on computational argumentation, providing faithful explanations and enabling users to contest incorrect decisions. However, this paradigm is limited to pre-defined binary choices and only supports local contestation for specific instances, leaving the underlying decision logic unchanged and prone to repeated mistakes. In this paper, we introduce ArgEval, a framework that shifts from instance-specific reasoning to structured evaluation of general decision options. Rather than mining arguments solely for individual cases, ArgEval systematically maps task-specific decision spaces, builds corresponding option ontologies, and constructs general argumentation frameworks (AFs) for each option. These frameworks can then be instantiated to provide explainable recommendations for specific cases while still supporting global contestability through modification of the shared AFs. We investigate the effectiveness of ArgEval on treatment recommendation for glioblastoma, an aggressive brain tumour, and show that it can produce explainable guidance aligned with clinical practice.
Paper Structure (25 sections, 5 figures, 3 tables, 3 algorithms)

This paper contains 25 sections, 5 figures, 3 tables, 3 algorithms.

Figures (5)

  • Figure 1: Illustration of ArgEval inference for one of the treatment options for glioblastoma. Case Parameters extracted from the Case Description are used to instantiate the General QBAF associated with the given option ( Root Argument), removing the dashed nodes whose conditions are not satisfied. The Prediction After Instantiation is obtained from the instantiated framework, which also serves as a faithful explanation.
  • Figure 2: Overview of the ArgEval pipeline. Top: given natural-language policy documents that specify general criteria for decision-making in a certain domain, ArgEval builds a decision-space ontology and constructs general QBAFs for each candidate decision in the ontology. Bottom: at inference time, the general QBAF is instantiated with the parameters of a specific case, providing faithfully explainable decision recommendations. Users can contest the decision-space ontology, the general QBAFs, the extracted case parameters and the general parameter schema specifying the properties to be extracted, in response to incorrect recommendations or explanations produced by the model.
  • Figure 3: Subset of a glioblastoma treatment option ontology automatically constructed from the relevant clinical guidelines. Only the entities and the hierarchical relations are visualised without the corresponding text chunks and provenance relations. Note that the used LLM has incorrectly categorised Lomustine as a separate treatment rather than a variant of Alkylating Agent Chemotherapy, although this has no effect on the rest of the ArgEval pipeline. The 9 leaves of the ontology are used in our main experiments.
  • Figure 4: Illustration of the ArgEval contestability experiment. To correct the initially suboptimal recommendations, we make small adjustments to the base scores in the general argumentation framework for the radiotherapy 60 Gy treatment option (with initial scores shown in smaller red font and updated scores in larger green font) and clarify the descriptions of two parameters associated with surgical resection in the parameter schema. These modifications are sufficient to achieve a perfect score on this instance while also substantially improving the overall performance. Treatments recommended by the ground-truth labels are shown in green, with those not recommended in red.
  • Figure : Premises and critical questions of the argument scheme $s_\text{arg}$ used for the glioblastoma treatment recommendation task.