Policy Gradient-Driven Noise Mask

Mehmet Can Yavuz; Yang Yang

Policy Gradient-Driven Noise Mask

Mehmet Can Yavuz, Yang Yang

TL;DR

This work tackles the challenge of heterogeneity in multi-modal, multi-organ biomedical imaging by introducing a reinforcement learning-based pretraining pipeline that jointly trains a lightweight policy network and a classifier. The policy network generates image-specific noise masks from a differentiable Beta distribution $\mathcal{B}(\alpha, \beta)$, optimized via policy gradient with log-probabilities and a cross-entropy reward, to regularize the classifier during pretraining; the policy is discarded at inference, and an intermediate heated model is finetuned for final predictions. Empirical results on RadImageNet show that this gradient policy approach improves classification performance and generalization to unseen concepts, with notable gains on small downstream datasets and when transferring from RadImageNet to MedMNIST-derived tasks. The method demonstrates robust improvements for both large and lightweight backbones and suggests practical benefits for biomedical imaging pipelines where cross-modality and cross-organ heterogeneity are prevalent, including improved few-shot adaptation and unseen-concept generalization.

Abstract

Deep learning classifiers face significant challenges when dealing with heterogeneous multi-modal and multi-organ biomedical datasets. The low-level feature distinguishability limited to imaging-modality hinders the classifiers' ability to learn high-level semantic relationships, resulting in sub-optimal performance. To address this issue, image augmentation strategies are employed as regularization techniques. While additive noise input during network training is a well-established augmentation as regularization method, modern pipelines often favor more robust techniques such as dropout and weight decay. This preference stems from the observation that combining these established techniques with noise input can adversely affect model performance. In this study, we propose a novel pretraining pipeline that learns to generate conditional noise mask specifically tailored to improve performance on multi-modal and multi-organ datasets. As a reinforcement learning algorithm, our approach employs a dual-component system comprising a very light-weight policy network that learns to sample conditional noise using a differentiable beta distribution as well as a classifier network. The policy network is trained using the reinforce algorithm to generate image-specific noise masks that regularize the classifier during pretraining. A key aspect is that the policy network's role is limited to obtaining an intermediate (or heated) model before fine-tuning. During inference, the policy network is omitted, allowing direct comparison between the baseline and noise-regularized models. We conducted experiments and related analyses on RadImageNet datasets. Results demonstrate that fine-tuning the intermediate models consistently outperforms conventional training algorithms on both classification and generalization to unseen concept tasks. https://github.com/convergedmachine/Policy-Gradient-Driven-Noise-Mask

Policy Gradient-Driven Noise Mask

TL;DR

, optimized via policy gradient with log-probabilities and a cross-entropy reward, to regularize the classifier during pretraining; the policy is discarded at inference, and an intermediate heated model is finetuned for final predictions. Empirical results on RadImageNet show that this gradient policy approach improves classification performance and generalization to unseen concepts, with notable gains on small downstream datasets and when transferring from RadImageNet to MedMNIST-derived tasks. The method demonstrates robust improvements for both large and lightweight backbones and suggests practical benefits for biomedical imaging pipelines where cross-modality and cross-organ heterogeneity are prevalent, including improved few-shot adaptation and unseen-concept generalization.

Abstract

Paper Structure (9 sections, 10 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 9 sections, 10 equations, 9 figures, 6 tables, 1 algorithm.

Introduction
Policy Gradient-Driven Noise Mask
Experiments
Results & Discussion
Conclusion
RadImageNet: Artifacts and The Refinement
Artifact Refinement Process
Automated Conversion Tool for RadImageNet
Enhanced MedMNIST

Figures (9)

Figure 1: Heterogeneity in medical imaging datasets (RadImageNet mei2022radimagenet) across modalities and anatomical regions. (Left) CT scan of the lungs. (Second left) MRI scan of the shoulder. (Second Right) Ultrasound image of the ovary. (Right) Pixel intensity histograms as indicator for low-level image features of CT (blue), MRI (red), and ultrasound (green) images, illustrating variations in brightness, contrast, and noise levels.
Figure 2: Schematic diagram of Reinforcement learning. At left agent takes action and change the state in environment and gain reward. At right, beta sampler (policy network) generates noise matrix and classifier as a differentiable environment computes the log-likelihood and updates the state variables $\alpha$ and $\beta$.
Figure 3: Diagram of our proposed pipeline using deep learning, illustrating the process from original image through stochastic masking, feature extraction, beta sampling and classification to produce a prediction with a cross-entropy objective. The blue and green color parameters to compute objective function.
Figure 4: Diagram of a policy network architecture (at left) showing the flow from input feature tensor to weighted tensor output. The network processes the input through a function $h(\cdot)$, projects the feature vector into Beta distribution parameters $b_1(x)$ and $b_2(x)$, derives alpha and beta values for the Beta function, calculates log probability (logP). The outputs are visualized over the dashed region as colorful circles. As an example (at right), the process of applying a stochastic masking to a medical image, showcasing the transformation from a noise matrix to the final masked image, which is part of the image processing pipeline involving steps such as noise matrix acquisition, upsampling, blurring, and applying.
Figure 5: Each figure represents a different unseen but related dataset: Breast - US, Breast Cancer - US, Brain - MRI, Brain Tumor - MRI, Pneumonia - XR, OrganA, OrganC, OrganS - Abdominal CT. Different color curves corresponds to different pretrained model on few-shot adaptability. The orange curves represent the performance of Gradient Policy RadImageNet, the green curves show the results for ImageNet, and the blue curves indicate the performance of ImageNet pretrained models. The evaluation is carried out for 8, 16, 32, 64, 128, 256 samples and 10 trials in each sample size. The vertical lines are the t-statistics %95 confidence interval.
...and 4 more figures

Policy Gradient-Driven Noise Mask

TL;DR

Abstract

Policy Gradient-Driven Noise Mask

Authors

TL;DR

Abstract

Table of Contents

Figures (9)