Weakly supervised localisation of prostate cancer using reinforcement learning for bi-parametric MR images
Martynas Pocius, Wen Yan, Dean C. Barratt, Mark Emberton, Matthew J. Clarkson, Yipeng Hu, Shaheer U. Saeed
TL;DR
This paper tackles localising cancerous lesions in bi-parametric MR prostate images without localisation annotations by framing the task as reinforcement learning with weak supervision. A pre-trained object-ness classifier, trained on image-level labels, provides a non-binarised probability-based reward to train a localisation controller via PPO. On a large clinical dataset, the proposed method outperforms multiple-instance learning and achieves localization performance comparable to fully supervised methods, while potentially reducing labeling bias. The approach offers a scalable, explainable localization framework that leverages image-level labels and a learned reward signal to identify regions of interest in medical images.
Abstract
In this paper we propose a reinforcement learning based weakly supervised system for localisation. We train a controller function to localise regions of interest within an image by introducing a novel reward definition that utilises non-binarised classification probability, generated by a pre-trained binary classifier which classifies object presence in images or image crops. The object-presence classifier may then inform the controller of its localisation quality by quantifying the likelihood of the image containing an object. Such an approach allows us to minimize any potential labelling or human bias propagated via human labelling for fully supervised localisation. We evaluate our proposed approach for a task of cancerous lesion localisation on a large dataset of real clinical bi-parametric MR images of the prostate. Comparisons to the commonly used multiple-instance learning weakly supervised localisation and to a fully supervised baseline show that our proposed method outperforms the multi-instance learning and performs comparably to fully-supervised learning, using only image-level classification labels for training.
