Table of Contents
Fetching ...

Does context matter in digital pathology?

Paulina Tomaszewska, Mateusz Sperkowski, Przemysław Biecek

TL;DR

This study investigates whether deep learning vision models in digital pathology rely on contextual tissue information when classifying histopathology patches. By systematically abating context with a black border around a central patch in PatchCamelyon data, the authors quantify how context size affects performance across CNNs and Transformer-based architectures, revealing that recall and overall accuracy degrade as context is reduced. They show variation in sensitivity to context across architectures and pretraining schemes (e.g., Swin, ViT variants, MoCo, MAE), with some models exhibiting frequent prediction swings as context changes. The work highlights the risk of relying on partial context in clinical settings and suggests directions for explanation methods and histopathologist collaboration to validate swinging cases and improve robustness.

Abstract

The development of Artificial Intelligence for healthcare is of great importance. Models can sometimes achieve even superior performance to human experts, however, they can reason based on spurious features. This is not acceptable to the experts as it is expected that the models catch the valid patterns in the data following domain expertise. In the work, we analyse whether Deep Learning (DL) models for vision follow the histopathologists' practice so that when diagnosing a part of a lesion, they take into account also the surrounding tissues which serve as context. It turns out that the performance of DL models significantly decreases when the amount of contextual information is limited, therefore contextual information is valuable at prediction time. Moreover, we show that the models sometimes behave in an unstable way as for some images, they change the predictions many times depending on the size of the context. It may suggest that partial contextual information can be misleading.

Does context matter in digital pathology?

TL;DR

This study investigates whether deep learning vision models in digital pathology rely on contextual tissue information when classifying histopathology patches. By systematically abating context with a black border around a central patch in PatchCamelyon data, the authors quantify how context size affects performance across CNNs and Transformer-based architectures, revealing that recall and overall accuracy degrade as context is reduced. They show variation in sensitivity to context across architectures and pretraining schemes (e.g., Swin, ViT variants, MoCo, MAE), with some models exhibiting frequent prediction swings as context changes. The work highlights the risk of relying on partial context in clinical settings and suggests directions for explanation methods and histopathologist collaboration to validate swinging cases and improve robustness.

Abstract

The development of Artificial Intelligence for healthcare is of great importance. Models can sometimes achieve even superior performance to human experts, however, they can reason based on spurious features. This is not acceptable to the experts as it is expected that the models catch the valid patterns in the data following domain expertise. In the work, we analyse whether Deep Learning (DL) models for vision follow the histopathologists' practice so that when diagnosing a part of a lesion, they take into account also the surrounding tissues which serve as context. It turns out that the performance of DL models significantly decreases when the amount of contextual information is limited, therefore contextual information is valuable at prediction time. Moreover, we show that the models sometimes behave in an unstable way as for some images, they change the predictions many times depending on the size of the context. It may suggest that partial contextual information can be misleading.
Paper Structure (14 sections, 7 figures, 3 tables)

This paper contains 14 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: The scheme of the proposed study. The yellow squares in the center of histopathological images depict the regions that the annotations are based on (the squares are shown only for visualization purposes and are not present in dataset images). The black border is applied to remove some parts of contextual information.
  • Figure 2: The histopathological image padded with black border with dimensions specified. The dimension s denotes context size.
  • Figure 3: The performance gap when the context size is limited. The performance metrics of Deep Learning models decreased by the respective reference values (when full context is available) under different context sizes are shown. Note that the values on the x-axis are in decreasing order which makes an interpretation of the experiments easier. The y-axis is not shared within the subplots so that the variations of results for different models are more visible. The markers on the curves corresponding to transformer-based models highlight a smaller number of data points than in the case of convolutional models.
  • Figure 4: The misleading nature of context depending on its size (on the example of DenseNet121). The probability of the class tumour is shown in two cases: all images that experienced the change in the prediction in more than half of all context sizes (top), and a sample of images that changed the class only once (bottom).
  • Figure 5: Number of images undergoing the change of prediction for the first time given the particular context size with the distinction on the initial and consecutive model prediction ('swinging images' not included). Note that for better visibility, the y-axis is not shared between subplots. The bin width is equal to 1. The results are provided for the DenseNet121.
  • ...and 2 more figures