Table of Contents
Fetching ...

MamT$^4$: Multi-view Attention Networks for Mammography Cancer Classification

Alisher Ibragimov, Sofya Senotrusova, Arsenii Litvinov, Egor Ushakov, Evgeny Karpulevich, Yury Markin

TL;DR

This study is the first to achieve a ROC-AUC of $84.0 \pm 1.7$ and an F1 score of $56.3$ on an independent test dataset of Vietnam digital mammography (VinDr- Mammo), which is preprocessed with the cropping model.

Abstract

In this study, we introduce a novel method, called MamT$^4$, which is used for simultaneous analysis of four mammography images. A decision is made based on one image of a breast, with attention also devoted to three additional images: another view of the same breast and two images of the other breast. This approach enables the algorithm to closely replicate the practice of a radiologist who reviews the entire set of mammograms for a patient. Furthermore, this paper emphasizes the preprocessing of images, specifically proposing a cropping model (U-Net based on ResNet-34) to help the method remove image artifacts and focus on the breast region. To the best of our knowledge, this study is the first to achieve a ROC-AUC of 84.0 $\pm$ 1.7 and an F1 score of 56.0 $\pm$ 1.3 on an independent test dataset of Vietnam digital mammography (VinDr-Mammo), which is preprocessed with the cropping model.

MamT$^4$: Multi-view Attention Networks for Mammography Cancer Classification

TL;DR

This study is the first to achieve a ROC-AUC of and an F1 score of on an independent test dataset of Vietnam digital mammography (VinDr- Mammo), which is preprocessed with the cropping model.

Abstract

In this study, we introduce a novel method, called MamT, which is used for simultaneous analysis of four mammography images. A decision is made based on one image of a breast, with attention also devoted to three additional images: another view of the same breast and two images of the other breast. This approach enables the algorithm to closely replicate the practice of a radiologist who reviews the entire set of mammograms for a patient. Furthermore, this paper emphasizes the preprocessing of images, specifically proposing a cropping model (U-Net based on ResNet-34) to help the method remove image artifacts and focus on the breast region. To the best of our knowledge, this study is the first to achieve a ROC-AUC of 84.0 1.7 and an F1 score of 56.0 1.3 on an independent test dataset of Vietnam digital mammography (VinDr-Mammo), which is preprocessed with the cropping model.

Paper Structure

This paper contains 13 sections, 2 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: An overview of the training process of the CNN and classification layer applied to a binary classification problem using a single view. Subsequently, the trained CNN block is employed to derive a feature vector ($z_i$) from the mammography image ($x_i$).
  • Figure 2: A summary of the MamT$^4$ framework that is used for cancer classification on $x_i^{0}$ images. Here, $x_i^{0}$ represents the primary view, while $x_i^{1}$ is the corresponding ipsilateral view to the primary one. Similarly, $x_i^{2}$ depicts the corresponding bilateral view to the main view, whereas $x_i^{3}$ illustrates the ipsilateral view of $x_i^{2}$. The CNN block, which gives the features vectors ($z_i^0$, $z_i^1$, $z_i^2$, $z_i^3$) with fixed length 1536 per each one, are untrainable during this stage. Each vector is divided into fixed-size patches, each of which is linearly embedded. After adding position embeddings, the resulting sequence of vectors is fed to a Transformer Encoder. In order to perform classification, we use the standard approach of adding an extra learnable [class] token to the sequence. The illustration of the Transformer Encoder was inspired by Dosovitskiy et al. dosovitskiy2020image
  • Figure 3: The visualization contrasts the predictions of two models: the model without preprocessing and the model utilizing cropping. In the first two columns, we demonstrate instances where the cropping model made correct predictions on the cropped images, while the model without cropping failed to do so on the original images. The third column presents a less common scenario (34 instances as opposed to 84) where the model without cropping correctly identifies the images as normal, and the cropping model errors. Following that, there is an example where both models correctly identifies cancer. Finally, the last column shows a case where both models make incorrect predictions.