LM-IGTD: a 2D image generator for low-dimensional and mixed-type tabular data to leverage the potential of convolutional neural networks
Vanesa Gómez-Martínez, Francisco J. Lara-Abelenda, Pablo Peiro-Corbacho, David Chushig-Muzo, Conceicao Granja, Cristina Soguero-Ruiz
TL;DR
This work tackles the challenge of leveraging CNNs for tabular data that are often low-dimensional and mixed-type by introducing LM-IGTD, a tabular-to-image pipeline that augments data with stochastic noise and adapts IGTD to preserve feature relationships in 2D images. It provides an end-to-end mapping from original features to image regions and employs Grad-CAM for post-hoc interpretability. The approach is evaluated on 12 real-world, low-dimensional mixed-type datasets, showing improvements over traditional tabular ML baselines in several binary and multiclass tasks, and offering interpretable visualizations of which features drive predictions. Overall, LM-IGTD demonstrates that noise-augmented tabular-to-image representations can enable CNNs to outperform or match traditional models while providing actionable interpretability for mixed-type data.
Abstract
Tabular data have been extensively used in different knowledge domains. Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features (images), outperforming predictive results of traditional models. Recently, several researchers have proposed transforming tabular data into images to leverage the potential of CNNs and obtain high results in predictive tasks such as classification and regression. In this paper, we present a novel and effective approach for transforming tabular data into images, addressing the inherent limitations associated with low-dimensional and mixed-type datasets. Our method, named Low Mixed-Image Generator for Tabular Data (LM-IGTD), integrates a stochastic feature generation process and a modified version of the IGTD. We introduce an automatic and interpretable end-to-end pipeline, enabling the creation of images from tabular data. A mapping between original features and the generated images is established, and post hoc interpretability methods are employed to identify crucial areas of these images, enhancing interpretability for predictive tasks. An extensive evaluation of the tabular-to-image generation approach proposed on 12 low-dimensional and mixed-type datasets, including binary and multi-class classification scenarios. In particular, our method outperformed all traditional ML models trained on tabular data in five out of twelve datasets when using images generated with LM-IGTD and CNN. In the remaining datasets, LM-IGTD images and CNN consistently surpassed three out of four traditional ML models, achieving similar results to the fourth model.
