Table of Contents
Fetching ...

ManiNeg: Manifestation-guided Multimodal Pretraining for Mammography Classification

Xujun Li, Xin Wei, Jing Jiang, Danxiang Chen, Wei Zhang, Jinpeng Li

TL;DR

The results demonstrate that ManiNeg not only improves representation in both unimodal and multimodal contexts but also shows generalization across datasets.

Abstract

Breast cancer is a significant threat to human health. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning involves negative sampling, where the selection of appropriate hard negative samples is essential for driving representations to retain detailed information about lesions. In contrastive learning, it is often assumed that features can sufficiently capture semantic content, and that each minibatch inherently includes ideal hard negative samples. However, the characteristics of breast lumps challenge these assumptions. In response, we introduce ManiNeg, a novel approach that leverages manifestations as proxies to mine hard negative samples. Manifestations, which refer to the observable symptoms or signs of a disease, provide a knowledge-driven and robust basis for choosing hard negative samples. This approach benefits from its invariance to model optimization, facilitating efficient sampling. To support ManiNeg and future research endeavors, we developed the MVKL dataset, which includes multi-view mammograms, corresponding reports, meticulously annotated manifestations, and pathologically confirmed benign-malignant outcomes. We evaluate ManiNeg on the benign and malignant classification task. Our results demonstrate that ManiNeg not only improves representation in both unimodal and multimodal contexts but also shows generalization across datasets. The MVKL dataset and our codes are publicly available at https://github.com/wxwxwwxxx/ManiNeg.

ManiNeg: Manifestation-guided Multimodal Pretraining for Mammography Classification

TL;DR

The results demonstrate that ManiNeg not only improves representation in both unimodal and multimodal contexts but also shows generalization across datasets.

Abstract

Breast cancer is a significant threat to human health. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning involves negative sampling, where the selection of appropriate hard negative samples is essential for driving representations to retain detailed information about lesions. In contrastive learning, it is often assumed that features can sufficiently capture semantic content, and that each minibatch inherently includes ideal hard negative samples. However, the characteristics of breast lumps challenge these assumptions. In response, we introduce ManiNeg, a novel approach that leverages manifestations as proxies to mine hard negative samples. Manifestations, which refer to the observable symptoms or signs of a disease, provide a knowledge-driven and robust basis for choosing hard negative samples. This approach benefits from its invariance to model optimization, facilitating efficient sampling. To support ManiNeg and future research endeavors, we developed the MVKL dataset, which includes multi-view mammograms, corresponding reports, meticulously annotated manifestations, and pathologically confirmed benign-malignant outcomes. We evaluate ManiNeg on the benign and malignant classification task. Our results demonstrate that ManiNeg not only improves representation in both unimodal and multimodal contexts but also shows generalization across datasets. The MVKL dataset and our codes are publicly available at https://github.com/wxwxwwxxx/ManiNeg.
Paper Structure (14 sections, 6 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 14 sections, 6 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Contrastive pretraining scheme for mammography analysis. (a) Unimodal learning with images. (b) Multimodal learning with images and manifestations. $i$ and $j$ denote a pair of views from instance $\bm x_1$, marked in green. $k$ and $l$ denote a pair of views from instance $\bm x_2$, marked in yellow. $f_I(\cdot)$ and $f_M(\cdot)$ represent the encoders for images and manifestations, respectively. $g(\cdot)$ is the shared projector. Samples are drawn closer to the positive samples (e.g., $\bm z_i$ and $\bm z_j$ attract each other) and are repelled from negative samples (e.g., $\bm z_i$ and $\bm z_k$ repel each other) in the representation sphere. Based on these, the model learns features from the instances in an unsupervised manner.
  • Figure 2: ManiNeg and uniform sampling. We assume independent traits in the manifestation have a 50% occurrence probability for illustration. On a 4-bit manifestation scenario, where the distribution of Hamming distances between negatives and the anchor aligns with the binomial distribution $B(4,0.5)$. Uniform sampling most frequently results in a Hamming distance of 2, missing the ideal hard negatives at a distance of 1. ManiNeg, sampling by Hamming distance, directly targets these hard negatives, effectively enhancing sample selection for improved learning.
  • Figure 3: Schematic of a binary manifestation vector. The manifestation is initially annotated according to the header listed in Table \ref{['tab:mani_header']} and subsequently expanded into a 35-dimensional binary vector. We show a binary manifestation vector surrounding its corresponding breast lump. Green: presence; Red: absence.
  • Figure 4: ManiNeg concept in probability. Since each trait within a manifestation is independent with an occurrence probability of $p_m = 0.5$, the Hamming distance $x$ between an anchor and a negative sample's manifestation adheres to a binomial distribution. As the number of traits $n$ increases, this binomial distribution converges towards a Gaussian distribution, $\mathcal{N}(\mu,\sigma^2)$, where its cumulative density function is represented as $\Phi(x;\mu,\sigma^2)$. Based on that, we can delineate the correlation between the manifestation size $n$ and the minimum threshold for sampling Hamming distances, defined as $\mathrm{inf}\{x|\Phi(x;\mu,\sigma^2)\space\geq\space0.0013\}$, effectively $\mu - 3\sigma$. It reveals a direct relationship: as $n$ escalates, the threshold for hard negative sampling—or the lower bound of the Hamming distance—likewise increases. This increment underscores the growing scarcity of hard negative samples with the expansion of manifestation size, highlighting a pivotal challenge in efficiently identifying such samples as the complexity of the data representation increases.
  • Figure 5: Demonstration of ManiNeg using different values of $\mu$ for sampling. (a) Histogram of the Hamming distances between the anchor and the negative samples. (b) Histogram of the Hamming distances between all negative pairs within the minibatch.
  • ...and 1 more figures