Classification with 2-D Convolutional Neural Networks for breast cancer diagnosis
Anuraganand Sharma, Dinesh Kumar
TL;DR
This work aims to extend CNN applicability to non-image, 1-D clinical breast cancer data by proposing three data-wranging methods that map 1-D vectors to 2-D images for CNN processing. Using a VGG16-like CNN, the authors demonstrate on the Wisconsin Diagnostic (WDBC) and Wisconsin Original (WBC) datasets that transformed images enable competitive, and in some cases near-perfect, classification performance, outperforming 1-D CNN baselines. The key contributions are the Equidistant Bar Graphs, Normalized Distance Matrix, and a combined 3-layer image representation (Type-3), along with a systematic evaluation showing strongest results when using Type-3 with px1. This approach broadens CNN applicability to non-image, non-time-series clinical data and opens avenues for richer encodings and optimization to further improve performance across diverse datasets.
Abstract
Breast cancer is the most common cancer in women. Classification of cancer/non-cancer patients with clinical records requires high sensitivity and specificity for an acceptable diagnosis test. The state-of-the-art classification model - Convolutional Neural Network (CNN), however, cannot be used with clinical data that are represented in 1-D format. CNN has been designed to work on a set of 2-D matrices whose elements show some correlation with neighboring elements such as in image data. Conversely, the data examples represented as a set of 1-D vectors -- apart from the time series data -- cannot be used with CNN, but with other classification models such as Artificial Neural Networks or RandomForest. We have proposed some novel preprocessing methods of data wrangling that transform a 1-D data vector, to a 2-D graphical image with appropriate correlations among the fields to be processed on CNN. We tested our methods on Wisconsin Original Breast Cancer (WBC) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets. To our knowledge, this work is novel on non-image to image data transformation for the non-time series data. The transformed data processed with CNN using VGGnet-16 shows competitive results for the WBC dataset and outperforms other known methods for the WDBC dataset.
