Table of Contents
Fetching ...

An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records

Joakim Edin, Maria Maistro, Lars Maaløe, Lasse Borgholt, Jakob D. Havtorn, Tuukka Ruotsalo

TL;DR

This work tackles explainability in healthcare record processing by proposing an unsupervised pipeline that achieves supervised-level explainability for automated medical coding. It combines adversarial robustness training (IGR, PGD, TM) with a novel AttInGrad attribution method that fuses attention scores and input-gradients to produce plausible and faithful explanations without evidence-span annotations. Empirical results on MIMIC-III and MDACE show that the best unsupervised configuration matches or closely approaches the supervised baseline in explanation quality, while offering substantial improvements in plausibility and faithfulness over attention-based explanations. The study provides practical insights into training strategies that enhance explanations, analyzes the limitations of attention explanations, and delivers code and model weights to support reproducibility and integration into real-world clinical coding workflows.

Abstract

Electronic healthcare records are vital for patient safety as they document conditions, plans, and procedures in both free text and medical codes. Language models have significantly enhanced the processing of such records, streamlining workflows and reducing manual data entry, thereby saving healthcare providers significant resources. However, the black-box nature of these models often leaves healthcare professionals hesitant to trust them. State-of-the-art explainability methods increase model transparency but rely on human-annotated evidence spans, which are costly. In this study, we propose an approach to produce plausible and faithful explanations without needing such annotations. We demonstrate on the automated medical coding task that adversarial robustness training improves explanation plausibility and introduce AttInGrad, a new explanation method superior to previous ones. By combining both contributions in a fully unsupervised setup, we produce explanations of comparable quality, or better, to that of a supervised approach. We release our code and model weights.

An Unsupervised Approach to Achieve Supervised-Level Explainability in Healthcare Records

TL;DR

This work tackles explainability in healthcare record processing by proposing an unsupervised pipeline that achieves supervised-level explainability for automated medical coding. It combines adversarial robustness training (IGR, PGD, TM) with a novel AttInGrad attribution method that fuses attention scores and input-gradients to produce plausible and faithful explanations without evidence-span annotations. Empirical results on MIMIC-III and MDACE show that the best unsupervised configuration matches or closely approaches the supervised baseline in explanation quality, while offering substantial improvements in plausibility and faithfulness over attention-based explanations. The study provides practical insights into training strategies that enhance explanations, analyzes the limitations of attention explanations, and delivers code and model weights to support reproducibility and integration into real-world clinical coding workflows.

Abstract

Electronic healthcare records are vital for patient safety as they document conditions, plans, and procedures in both free text and medical codes. Language models have significantly enhanced the processing of such records, streamlining workflows and reducing manual data entry, thereby saving healthcare providers significant resources. However, the black-box nature of these models often leaves healthcare professionals hesitant to trust them. State-of-the-art explainability methods increase model transparency but rely on human-annotated evidence spans, which are costly. In this study, we propose an approach to produce plausible and faithful explanations without needing such annotations. We demonstrate on the automated medical coding task that adversarial robustness training improves explanation plausibility and introduce AttInGrad, a new explanation method superior to previous ones. By combining both contributions in a fully unsupervised setup, we produce explanations of comparable quality, or better, to that of a supervised approach. We release our code and model weights.
Paper Structure (52 sections, 16 equations, 6 figures, 12 tables)

This paper contains 52 sections, 16 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Example of an input, prediction, and feature attribution explanation highlighted in the input.
  • Figure 2: Comparison of plausibility across various combinations of explanation methods and models from this study and previous work. Most previous studies used Attention and a standard medical coding model (B$_{\text{U}}$). chengMDACEMIMICDocuments2023 instead used a supervised model trained on evidence-span annotations (B$_{\text{S}}$). We proposed AttInGrad and an adversarial robust model (TM).
  • Figure 3: Faithfulness of Attention, InputXGrad, and AttInGrad across models.
  • Figure 4: The relationship between explanation quality and the proportion of the top five most important tokens that are special tokens (tokens devoid of alphanumeric characters). Each data point is the average statistic on the MDACE test set for a seed of B$_{\text{U}}$. We fitted a linear regression for each explanation method and calculated the Pearson correlation ($r$). The dotted vertical lines represent the proportion of special tokens in the evidence-span annotations.
  • Figure 5: The PLM-CA architecture we used in our experiments.
  • ...and 1 more figures