Patch Stitching Data Augmentation for Cancer Classification in Pathology Images
Jiamu Wang, Chang-Su Kim, Jin Tae Kwak
TL;DR
The paper addresses data scarcity and class imbalance in computational pathology by proposing Patch Stitching Image Synthesis (PaSS), a region-based image augmentation that creates new patches by stitching regions from multiple same-class images. It introduces two PaSS variants, PaSS_Rec (rectangular regions) and PaSS_SLIC (superpixel-based regions), and evaluates them on two colorectal cancer datasets with EfficientNet-B0 and ResNet-50. Results show that PaSS improves classification accuracy and F1 compared to training on original data, with stronger gains in cross-scanner tests; PaSS_SLIC generally offers better generalization and preserves tissue structures. The method is simple, low-cost, and extensible to other tissues and diseases, suggesting practical potential for mitigating data scarcity and imbalance in computational pathology.
Abstract
Computational pathology, integrating computational methods and digital imaging, has shown to be effective in advancing disease diagnosis and prognosis. In recent years, the development of machine learning and deep learning has greatly bolstered the power of computational pathology. However, there still remains the issue of data scarcity and data imbalance, which can have an adversarial effect on any computational method. In this paper, we introduce an efficient and effective data augmentation strategy to generate new pathology images from the existing pathology images and thus enrich datasets without additional data collection or annotation costs. To evaluate the proposed method, we employed two sets of colorectal cancer datasets and obtained improved classification results, suggesting that the proposed simple approach holds the potential for alleviating the data scarcity and imbalance in computational pathology.
