Could We Generate Cytology Images from Histopathology Images? An Empirical Study
Soumyajyoti Dey, Sukanta Chakraborty, Utso Guha Roy, Nibaran Das
TL;DR
The study addresses data scarcity in breast cancer cytology by exploring synthetic cytology generation from histopathology images using unpaired image-to-image translation. It empirically compares CycleGAN, with mappings $G: A \rightarrow B$ and $F: B \rightarrow A$ and losses $Loss_G$ and $Loss_{cyc}$, against Neural Style Transfer, applied to BreakHis histopathology and JUCYT cytology datasets. Results indicate CycleGAN-produced cytology better matches real cytology distributions (lower $FID$ and $KID$) than histology, while Neural Style Transfer mainly captures styling rather than nuclear morphology; some samples fail to preserve benign/malignant semantics. The work provides practical insights into data augmentation for medical imaging, highlights limitations such as finite synthetic samples and mislabeling risks, and suggests transfer-learning-based generative approaches for improved cross-domain synthesis.
Abstract
Automation in medical imaging is quite challenging due to the unavailability of annotated datasets and the scarcity of domain experts. In recent years, deep learning techniques have solved some complex medical imaging tasks like disease classification, important object localization, segmentation, etc. However, most of the task requires a large amount of annotated data for their successful implementation. To mitigate the shortage of data, different generative models are proposed for data augmentation purposes which can boost the classification performances. For this, different synthetic medical image data generation models are developed to increase the dataset. Unpaired image-to-image translation models here shift the source domain to the target domain. In the breast malignancy identification domain, FNAC is one of the low-cost low-invasive modalities normally used by medical practitioners. But availability of public datasets in this domain is very poor. Whereas, for automation of cytology images, we need a large amount of annotated data. Therefore synthetic cytology images are generated by translating breast histopathology samples which are publicly available. In this study, we have explored traditional image-to-image transfer models like CycleGAN, and Neural Style Transfer. Further, it is observed that the generated cytology images are quite similar to real breast cytology samples by measuring FID and KID scores.
