Proportion Estimation by Masked Learning from Label Proportion
Takumi Okuo, Kazuya Nishimura, Hiroaki Ito, Kazuhiro Terada, Akihiko Yoshizawa, Ryoma Bise
TL;DR
This paper tackles automatic estimation of the PD-L1 tumor proportion, defined as $r=\frac{N_{PD-L1+\,tumor}}{N_{tumor}}$, from pathological core images with limited cell-level annotations. It introduces a two-stage framework that first detects tumor cells to produce a tumor mask $M$, then estimates $\hat{r}$ from masked tumor maps using $s_p=\sum (F_p\odot M)$, $s_n=\sum (F_n\odot M)$ and $\hat{r}=\frac{s_p}{s_p+s_n}$, supervised by three annotation types (cell positions, tumor region, and proportion labels) and optimized with a weighted focal proportion loss. This loss extends LLP to handle interval-based proportion labels and data imbalance by weighting and focusing on hard intervals, with a hyperparameter $\gamma$ that can vary by proportion interval. Empirical results on clinical datasets show state-of-the-art accuracy and provide interpretable CAM-like maps that help clinicians understand the PD-L1 estimation while reducing annotation burden.
Abstract
The PD-L1 rate, the number of PD-L1 positive tumor cells over the total number of all tumor cells, is an important metric for immunotherapy. This metric is recorded as diagnostic information with pathological images. In this paper, we propose a proportion estimation method with a small amount of cell-level annotation and proportion annotation, which can be easily collected. Since the PD-L1 rate is calculated from only `tumor cells' and not using `non-tumor cells', we first detect tumor cells with a detection model. Then, we estimate the PD-L1 proportion by introducing a masking technique to `learning from label proportion.' In addition, we propose a weighted focal proportion loss to address data imbalance problems. Experiments using clinical data demonstrate the effectiveness of our method. Our method achieved the best performance in comparisons.
