Table of Contents
Fetching ...

WeNLEX: Weakly Supervised Natural Language Explanations for Multilabel Chest X-ray Classification

Isabel Rio-Torto, Jaime S. Cardoso, Luís F. Teixeira

Abstract

Natural language explanations provide an inherently human-understandable way to explain black-box models, closely reflecting how radiologists convey their diagnoses in textual reports. Most works explicitly supervise the explanation generation process using datasets annotated with explanations. Thus, though plausible, the generated explanations are not faithful to the model's reasoning. In this work, we propose WeNLEX, a weakly supervised model for the generation of natural language explanations for multilabel chest X-ray classification. Faithfulness is ensured by matching images generated from their corresponding natural language explanations with original images, in the black-box model's feature space. Plausibility is maintained via distribution alignment with a small database of clinician-annotated explanations. We empirically demonstrate, through extensive validation on multiple metrics to assess faithfulness, simulatability, diversity, and plausibility, that WeNLEX is able to produce faithful and plausible explanations, using as little as 5 ground-truth explanations per diagnosis. Furthermore, WeNLEX can operate in both post-hoc and in-model settings. In the latter, i.e., when the multilabel classifier is trained together with the rest of the network, WeNLEX improves the classification AUC of the standalone classifier by 2.21%, thus showing that adding interpretability to the training process can actually increase the downstream task performance. Additionally, simply by changing the database, WeNLEX explanations are adaptable to any target audience, and we showcase this flexibility by training a layman version of WeNLEX, where explanations are simplified for non-medical users.

WeNLEX: Weakly Supervised Natural Language Explanations for Multilabel Chest X-ray Classification

Abstract

Natural language explanations provide an inherently human-understandable way to explain black-box models, closely reflecting how radiologists convey their diagnoses in textual reports. Most works explicitly supervise the explanation generation process using datasets annotated with explanations. Thus, though plausible, the generated explanations are not faithful to the model's reasoning. In this work, we propose WeNLEX, a weakly supervised model for the generation of natural language explanations for multilabel chest X-ray classification. Faithfulness is ensured by matching images generated from their corresponding natural language explanations with original images, in the black-box model's feature space. Plausibility is maintained via distribution alignment with a small database of clinician-annotated explanations. We empirically demonstrate, through extensive validation on multiple metrics to assess faithfulness, simulatability, diversity, and plausibility, that WeNLEX is able to produce faithful and plausible explanations, using as little as 5 ground-truth explanations per diagnosis. Furthermore, WeNLEX can operate in both post-hoc and in-model settings. In the latter, i.e., when the multilabel classifier is trained together with the rest of the network, WeNLEX improves the classification AUC of the standalone classifier by 2.21%, thus showing that adding interpretability to the training process can actually increase the downstream task performance. Additionally, simply by changing the database, WeNLEX explanations are adaptable to any target audience, and we showcase this flexibility by training a layman version of WeNLEX, where explanations are simplified for non-medical users.
Paper Structure (28 sections, 6 equations, 4 figures, 4 tables)

This paper contains 28 sections, 6 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Architecture of WeNLEX, a weakly supervised model that generates natural language explanations (NLEs) for a multilabel X-ray classifier. For each predicted diagnosis (e.g., atelectasis, edema), it produces an NLE (only the atelectasis NLE is shown). A pretrained, frozen text-only Encoder–Decoder is adapted with soft prompt tuning (Vision-to-Adapter and Soft Prompt) to take as input the image features, the entire prediction vector (including diagnosis and evidence labels), and the textual label of the diagnosis being explained. The NLE Generator Encoder outputs an NLE embedding, which is compared to ground-truth NLE embeddings for that diagnosis to ensure plausibility (Plausibility Loss). Each NLE is also given to a Text Embedding to Image model, which generates an image depicting its content. This image is then processed by the model being explained (MBE) to extract features. To enforce faithfulness, the average of these features across all NLEs for an image is compared with the original image features (Faithfulness Loss: Reconstruction). Finally, each NLE must recover the MBE’s original diagnosis prediction: the MBE’s output for the generated image/NLE is compared against the original prediction (Faithfulness Loss: Classification). Trainable layers/parameters are represented by the fire icon, while frozen blocks are represented by the snowflake.
  • Figure 2: Depiction of the deletion faithfulness metric: an image and a generated NLE are given to CheXagent, which grounds the text in the image. The identified regions are occluded, and the masked image is given to the model being explained (MBE). If the NLE is faithful, occluding these regions should significantly alter the MBE’s prediction.
  • Figure 3: CheXbert identified evidence is correct if it matches the evidence predicted by the model being explained (MBE) (which is also the evidence used to generate the NLE in the first place), but it is incorrect if judged against the ground-truth evidence. Since we want NLEs faithful to the MBE, the target evidence for computing the CLinical EVidence (CLEV) score should be the MBE’s predicted evidence.
  • Figure 4: Qualitative examples of WeNLEX for three different diagnoses, comparison with ground-truth NLEs, and with the layman version of WeNLEX, in which the generated NLEs are simplified to adapt to a non-medical audience.