Vortex Feature Positioning: Bridging Tabular IIoT Data and Image-Based Deep Learning
Jong-Ik Park, Sihoon Seong, JunKyu Lee, Cheol-Ho Hong
TL;DR
VFP introduces a correlation-driven, vortex-based method to convert high-dimensional IIoT tabular data into images tailored for CNNs, addressing overfitting and inefficiency of fixed-size representations. By embedding features with consideration of convolution operations and arranging them in a center-out vortex based on PCC, VFP yields flexible image sizes proportional to attribute count and improves generalization. Theoretical analysis links the structure to favorable optimization properties, including a convergence rate for SGD on VFP-generated data. Empirical evaluation on seven datasets shows VFP outperforms traditional tree-based methods and existing image-conversion approaches, underscoring its practical impact for scalable IIoT analytics.
Abstract
Tabular data from IIoT devices are typically analyzed using decision tree-based machine learning techniques, which struggle with high-dimensional and numeric data. To overcome these limitations, techniques converting tabular data into images have been developed, leveraging the strengths of image-based deep learning approaches such as Convolutional Neural Networks. These methods cluster similar features into distinct image areas with fixed sizes, regardless of the number of features, resembling actual photographs. However, this increases the possibility of overfitting, as similar features, when selected carefully in a tabular format, are often discarded to prevent this issue. Additionally, fixed image sizes can lead to wasted pixels with fewer features, resulting in computational inefficiency. We introduce Vortex Feature Positioning (VFP) to address these issues. VFP arranges features based on their correlation, spacing similar ones in a vortex pattern from the image center, with the image size determined by the attribute count. VFP outperforms traditional machine learning methods and existing conversion techniques in tests across seven datasets with varying real-valued attributes.
