Table of Contents
Fetching ...

Imputation-free Learning of Tabular Data with Missing Values using Incremental Feature Partitions in Transformer

Manar D. Samad, Kazi Fuad B. Akhter, Shourav B. Rabbani, Ibna Kowsar

TL;DR

The paper tackles the challenge of learning from tabular data with missing values without resorting to imputation. It introduces IFIAL, an imputation-free incremental attention learning framework that uses two attention masks to exclude missing values and incremental feature partitions to manage missing-rate heterogeneity. Across 17 diverse OpenML datasets and multiple missing-value types (MCAR, MNAR, natural), IFIAL with partition size $K=\frac{d}{2}$ consistently outperforms 11 imputation-based and imputation-free baselines in AUC, while also reducing computational overhead by avoiding imputation. The approach preserves data integrity, demonstrates robustness to high missing rates, and offers practical implications for healthcare and other data-rich domains where missing data are prevalent. Limitations include potential edge cases where imputation-based methods may be more efficient at very low missing rates and very large datasets."

Abstract

Tabular data sets with varying missing values are prepared for machine learning using an arbitrary imputation strategy. Synthetic values generated by imputation models often raise concerns regarding data quality and the reliability of data-driven outcomes. To address these concerns, this article proposes an imputation-free incremental attention learning (IFIAL) method for tabular data with missing values. A pair of attention masks is derived and retrofitted to a transformer to directly streamline tabular data without imputing or initializing missing values. The proposed method incrementally learns partitions of overlapping and fixed-size feature sets to enhance the performance of the transformer. The average classification performance rank order across 17 diverse tabular data sets highlights the superiority of IFIAL over 11 state-of-the-art learning methods with or without missing value imputations. Additional experiments corroborate the robustness of IFIAL to varying types and proportions of missing data, demonstrating its superiority over methods that rely on explicit imputations. A feature partition size equal to one-half the original feature space yields the best trade-off between computational efficiency and predictive performance. IFIAL is one of the first solutions that enables deep attention models to learn directly from tabular data, eliminating the need to impute missing values. %without the need for imputing missing values. The source code for this paper is publicly available.

Imputation-free Learning of Tabular Data with Missing Values using Incremental Feature Partitions in Transformer

TL;DR

The paper tackles the challenge of learning from tabular data with missing values without resorting to imputation. It introduces IFIAL, an imputation-free incremental attention learning framework that uses two attention masks to exclude missing values and incremental feature partitions to manage missing-rate heterogeneity. Across 17 diverse OpenML datasets and multiple missing-value types (MCAR, MNAR, natural), IFIAL with partition size consistently outperforms 11 imputation-based and imputation-free baselines in AUC, while also reducing computational overhead by avoiding imputation. The approach preserves data integrity, demonstrates robustness to high missing rates, and offers practical implications for healthcare and other data-rich domains where missing data are prevalent. Limitations include potential edge cases where imputation-based methods may be more efficient at very low missing rates and very large datasets."

Abstract

Tabular data sets with varying missing values are prepared for machine learning using an arbitrary imputation strategy. Synthetic values generated by imputation models often raise concerns regarding data quality and the reliability of data-driven outcomes. To address these concerns, this article proposes an imputation-free incremental attention learning (IFIAL) method for tabular data with missing values. A pair of attention masks is derived and retrofitted to a transformer to directly streamline tabular data without imputing or initializing missing values. The proposed method incrementally learns partitions of overlapping and fixed-size feature sets to enhance the performance of the transformer. The average classification performance rank order across 17 diverse tabular data sets highlights the superiority of IFIAL over 11 state-of-the-art learning methods with or without missing value imputations. Additional experiments corroborate the robustness of IFIAL to varying types and proportions of missing data, demonstrating its superiority over methods that rely on explicit imputations. A feature partition size equal to one-half the original feature space yields the best trade-off between computational efficiency and predictive performance. IFIAL is one of the first solutions that enables deep attention models to learn directly from tabular data, eliminating the need to impute missing values. %without the need for imputing missing values. The source code for this paper is publicly available.

Paper Structure

This paper contains 26 sections, 4 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Three strategies for handling missing values in tabular data. Top: Standalone imputation of missing values before classification. Middle: Joint learning of complete data representation and classification. Bottom: (Proposed) - no imputation or initialization of missing values needed to streamline only the observed values for classification.
  • Figure 2: Imputation-free Incremental Attention Learning (IFIAL) algorithm uses $P$ fixed-sized overlapping feature partitions to train a Feature-Tokenized Transformer (FTT) incrementally. Attention masks: $M_1$ operates as the attention column mask and $M_2$ as the attention row mask to exclude missing feature values from attention scoring.
  • Figure 3: Win matrix. Values are the fraction of experimental scenarios in which the row methods outperform the methods in the columns in terms of AUC scores.
  • Figure 4: Effect of increasing missing value rates. The average percentage of the reference AUC score is obtained across 13 data sets.
  • Figure 5: Effects of partition size (k) on average AUC scores obtained across varying missing value rates. The total number of features ($d$) for three data sets are: the Kc2 data set ($d = 21$), Diabetes data set ($d = 8$), and Dresses Sales data set ($d = 12$).
  • ...and 1 more figures