Table of Contents
Fetching ...

LM-IGTD: a 2D image generator for low-dimensional and mixed-type tabular data to leverage the potential of convolutional neural networks

Vanesa Gómez-Martínez, Francisco J. Lara-Abelenda, Pablo Peiro-Corbacho, David Chushig-Muzo, Conceicao Granja, Cristina Soguero-Ruiz

TL;DR

This work tackles the challenge of leveraging CNNs for tabular data that are often low-dimensional and mixed-type by introducing LM-IGTD, a tabular-to-image pipeline that augments data with stochastic noise and adapts IGTD to preserve feature relationships in 2D images. It provides an end-to-end mapping from original features to image regions and employs Grad-CAM for post-hoc interpretability. The approach is evaluated on 12 real-world, low-dimensional mixed-type datasets, showing improvements over traditional tabular ML baselines in several binary and multiclass tasks, and offering interpretable visualizations of which features drive predictions. Overall, LM-IGTD demonstrates that noise-augmented tabular-to-image representations can enable CNNs to outperform or match traditional models while providing actionable interpretability for mixed-type data.

Abstract

Tabular data have been extensively used in different knowledge domains. Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features (images), outperforming predictive results of traditional models. Recently, several researchers have proposed transforming tabular data into images to leverage the potential of CNNs and obtain high results in predictive tasks such as classification and regression. In this paper, we present a novel and effective approach for transforming tabular data into images, addressing the inherent limitations associated with low-dimensional and mixed-type datasets. Our method, named Low Mixed-Image Generator for Tabular Data (LM-IGTD), integrates a stochastic feature generation process and a modified version of the IGTD. We introduce an automatic and interpretable end-to-end pipeline, enabling the creation of images from tabular data. A mapping between original features and the generated images is established, and post hoc interpretability methods are employed to identify crucial areas of these images, enhancing interpretability for predictive tasks. An extensive evaluation of the tabular-to-image generation approach proposed on 12 low-dimensional and mixed-type datasets, including binary and multi-class classification scenarios. In particular, our method outperformed all traditional ML models trained on tabular data in five out of twelve datasets when using images generated with LM-IGTD and CNN. In the remaining datasets, LM-IGTD images and CNN consistently surpassed three out of four traditional ML models, achieving similar results to the fourth model.

LM-IGTD: a 2D image generator for low-dimensional and mixed-type tabular data to leverage the potential of convolutional neural networks

TL;DR

This work tackles the challenge of leveraging CNNs for tabular data that are often low-dimensional and mixed-type by introducing LM-IGTD, a tabular-to-image pipeline that augments data with stochastic noise and adapts IGTD to preserve feature relationships in 2D images. It provides an end-to-end mapping from original features to image regions and employs Grad-CAM for post-hoc interpretability. The approach is evaluated on 12 real-world, low-dimensional mixed-type datasets, showing improvements over traditional tabular ML baselines in several binary and multiclass tasks, and offering interpretable visualizations of which features drive predictions. Overall, LM-IGTD demonstrates that noise-augmented tabular-to-image representations can enable CNNs to outperform or match traditional models while providing actionable interpretability for mixed-type data.

Abstract

Tabular data have been extensively used in different knowledge domains. Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features (images), outperforming predictive results of traditional models. Recently, several researchers have proposed transforming tabular data into images to leverage the potential of CNNs and obtain high results in predictive tasks such as classification and regression. In this paper, we present a novel and effective approach for transforming tabular data into images, addressing the inherent limitations associated with low-dimensional and mixed-type datasets. Our method, named Low Mixed-Image Generator for Tabular Data (LM-IGTD), integrates a stochastic feature generation process and a modified version of the IGTD. We introduce an automatic and interpretable end-to-end pipeline, enabling the creation of images from tabular data. A mapping between original features and the generated images is established, and post hoc interpretability methods are employed to identify crucial areas of these images, enhancing interpretability for predictive tasks. An extensive evaluation of the tabular-to-image generation approach proposed on 12 low-dimensional and mixed-type datasets, including binary and multi-class classification scenarios. In particular, our method outperformed all traditional ML models trained on tabular data in five out of twelve datasets when using images generated with LM-IGTD and CNN. In the remaining datasets, LM-IGTD images and CNN consistently surpassed three out of four traditional ML models, achieving similar results to the fourth model.
Paper Structure (14 sections, 6 figures, 2 tables)

This paper contains 14 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Workflow of proposed methodology.
  • Figure 2: Correlation matrices between original variables and noisy variables with HoNG and HeNG in Ionos, Hepatitis and Tae datasets.
  • Figure 3: Mean±std of AUCROC values obtained using tabular data and LM-IGTD images and several ML and DL models for 5 test subsets of binary datasets.
  • Figure 4: Mean±std of AUCROC values obtained using tabular data and LM-IGTD images and several ML and DL models for 5 test subsets of multiclass datasets.
  • Figure 5: Interpretability analysis of the Hepatitis dataset: original image, feature-mapped image, and image with post-hoc Grad-CAM method
  • ...and 1 more figures