RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representations from X-Ray Images

Guang Li; Ren Togo; Takahiro Ogawa; Miki Haseyama

RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representations from X-Ray Images

Guang Li, Ren Togo, Takahiro Ogawa, Miki Haseyama

TL;DR

RGMIM can mask more valid regions, facilitating the learning of discriminative representations and the subsequent high-accuracy lung disease detection and the subsequent high-accuracy lung disease detection.

Abstract

In this study, we propose a novel method called region-guided masked image modeling (RGMIM) for learning meaningful representations from X-ray images. Our method adopts a new masking strategy that utilizes organ mask information to identify valid regions for learning more meaningful representations. We conduct quantitative evaluations on an open lung X-ray image dataset as well as masking ratio hyperparameter studies. When using the entire training set, RGMIM outperformed other comparable methods, achieving a 0.962 lung disease detection accuracy. Specifically, RGMIM significantly improved performance in small data volumes, such as 5% and 10% of the training set compared to other methods. RGMIM can mask more valid regions, facilitating the learning of discriminative representations and the subsequent high-accuracy lung disease detection. RGMIM outperforms other state-of-the-art self-supervised learning methods in experiments, particularly when limited training data is used.

RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representations from X-Ray Images

TL;DR

Abstract

Paper Structure (10 sections, 5 equations, 3 figures, 3 tables)

This paper contains 10 sections, 5 equations, 3 figures, 3 tables.

Background
Methods
Region-guided masking strategy
ViT encoder and decoder
Pretraining process
Fine-tuning and lung disease classification
Experiments
Dataset and Settings
Experimental Results
Conclusion

Figures (3)

Figure 1: Comparison of the random masking and the proposed region-guided masking.
Figure 2: Overview of RGMIM. The left indicates the pipeline and the right show the structure of the ViT encoder.
Figure 3: lung disease classification accuracy as fine-tuning epoch number increases. All methods use the ViT-Base model.

RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representations from X-Ray Images

TL;DR

Abstract

RGMIM: Region-Guided Masked Image Modeling for Learning Meaningful Representations from X-Ray Images

Authors

TL;DR

Abstract

Table of Contents

Figures (3)