Table of Contents
Fetching ...

Classification with 2-D Convolutional Neural Networks for breast cancer diagnosis

Anuraganand Sharma, Dinesh Kumar

TL;DR

This work aims to extend CNN applicability to non-image, 1-D clinical breast cancer data by proposing three data-wranging methods that map 1-D vectors to 2-D images for CNN processing. Using a VGG16-like CNN, the authors demonstrate on the Wisconsin Diagnostic (WDBC) and Wisconsin Original (WBC) datasets that transformed images enable competitive, and in some cases near-perfect, classification performance, outperforming 1-D CNN baselines. The key contributions are the Equidistant Bar Graphs, Normalized Distance Matrix, and a combined 3-layer image representation (Type-3), along with a systematic evaluation showing strongest results when using Type-3 with px1. This approach broadens CNN applicability to non-image, non-time-series clinical data and opens avenues for richer encodings and optimization to further improve performance across diverse datasets.

Abstract

Breast cancer is the most common cancer in women. Classification of cancer/non-cancer patients with clinical records requires high sensitivity and specificity for an acceptable diagnosis test. The state-of-the-art classification model - Convolutional Neural Network (CNN), however, cannot be used with clinical data that are represented in 1-D format. CNN has been designed to work on a set of 2-D matrices whose elements show some correlation with neighboring elements such as in image data. Conversely, the data examples represented as a set of 1-D vectors -- apart from the time series data -- cannot be used with CNN, but with other classification models such as Artificial Neural Networks or RandomForest. We have proposed some novel preprocessing methods of data wrangling that transform a 1-D data vector, to a 2-D graphical image with appropriate correlations among the fields to be processed on CNN. We tested our methods on Wisconsin Original Breast Cancer (WBC) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets. To our knowledge, this work is novel on non-image to image data transformation for the non-time series data. The transformed data processed with CNN using VGGnet-16 shows competitive results for the WBC dataset and outperforms other known methods for the WDBC dataset.

Classification with 2-D Convolutional Neural Networks for breast cancer diagnosis

TL;DR

This work aims to extend CNN applicability to non-image, 1-D clinical breast cancer data by proposing three data-wranging methods that map 1-D vectors to 2-D images for CNN processing. Using a VGG16-like CNN, the authors demonstrate on the Wisconsin Diagnostic (WDBC) and Wisconsin Original (WBC) datasets that transformed images enable competitive, and in some cases near-perfect, classification performance, outperforming 1-D CNN baselines. The key contributions are the Equidistant Bar Graphs, Normalized Distance Matrix, and a combined 3-layer image representation (Type-3), along with a systematic evaluation showing strongest results when using Type-3 with px1. This approach broadens CNN applicability to non-image, non-time-series clinical data and opens avenues for richer encodings and optimization to further improve performance across diverse datasets.

Abstract

Breast cancer is the most common cancer in women. Classification of cancer/non-cancer patients with clinical records requires high sensitivity and specificity for an acceptable diagnosis test. The state-of-the-art classification model - Convolutional Neural Network (CNN), however, cannot be used with clinical data that are represented in 1-D format. CNN has been designed to work on a set of 2-D matrices whose elements show some correlation with neighboring elements such as in image data. Conversely, the data examples represented as a set of 1-D vectors -- apart from the time series data -- cannot be used with CNN, but with other classification models such as Artificial Neural Networks or RandomForest. We have proposed some novel preprocessing methods of data wrangling that transform a 1-D data vector, to a 2-D graphical image with appropriate correlations among the fields to be processed on CNN. We tested our methods on Wisconsin Original Breast Cancer (WBC) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets. To our knowledge, this work is novel on non-image to image data transformation for the non-time series data. The transformed data processed with CNN using VGGnet-16 shows competitive results for the WBC dataset and outperforms other known methods for the WDBC dataset.

Paper Structure

This paper contains 21 sections, 12 figures, 8 tables, 2 algorithms.

Figures (12)

  • Figure 1: Snapshot of data file for Breast Cancer dataset WBC from dheeru_uci_2019
  • Figure 2: A general architecture of CNN – taken from saha_comprehensive_2018.
  • Figure 3: Bar graph for some data examples of WDBC dataset.
  • Figure 4: Features learned by the first convolutional layer for Breast Cancer dataset.
  • Figure 5: The normalized distance matrix for some data examples of WDBC dataset.
  • ...and 7 more figures