Table of Contents
Fetching ...

M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation

Zhongyi Yu, Zhenghao Wu, Shuhan Zhong, Weifeng Su, S. -H. Gary Chan, Chul-Ho Lee, Weipeng Zhuo

TL;DR

This work proposes M$^3$-Impute, which aims to explicitly leverage the missingness information and such correlations with novel masking schemes and achieves the effectiveness of M$^3$-Impute by achieving 20 best and 4 second-best MAE scores on average.

Abstract

Missing values are a common problem that poses significant challenges to data analysis and machine learning. This problem necessitates the development of an effective imputation method to fill in the missing values accurately, thereby enhancing the overall quality and utility of the datasets. Existing imputation methods, however, fall short of explicitly considering the `missingness' information in the data during the embedding initialization stage and modeling the entangled feature and sample correlations during the learning process, thus leading to inferior performance. We propose M$^3$-Impute, which aims to explicitly leverage the missingness information and such correlations with novel masking schemes. M$^3$-Impute first models the data as a bipartite graph and uses a graph neural network to learn node embeddings, where the refined embedding initialization process directly incorporates the missingness information. They are then optimized through M$^3$-Impute's novel feature correlation unit (FRU) and sample correlation unit (SRU) that effectively captures feature and sample correlations for imputation. Experiment results on 25 benchmark datasets under three different missingness settings show the effectiveness of M$^3$-Impute by achieving 20 best and 4 second-best MAE scores on average.

M$^3$-Impute: Mask-guided Representation Learning for Missing Value Imputation

TL;DR

This work proposes M-Impute, which aims to explicitly leverage the missingness information and such correlations with novel masking schemes and achieves the effectiveness of M-Impute by achieving 20 best and 4 second-best MAE scores on average.

Abstract

Missing values are a common problem that poses significant challenges to data analysis and machine learning. This problem necessitates the development of an effective imputation method to fill in the missing values accurately, thereby enhancing the overall quality and utility of the datasets. Existing imputation methods, however, fall short of explicitly considering the `missingness' information in the data during the embedding initialization stage and modeling the entangled feature and sample correlations during the learning process, thus leading to inferior performance. We propose M-Impute, which aims to explicitly leverage the missingness information and such correlations with novel masking schemes. M-Impute first models the data as a bipartite graph and uses a graph neural network to learn node embeddings, where the refined embedding initialization process directly incorporates the missingness information. They are then optimized through M-Impute's novel feature correlation unit (FRU) and sample correlation unit (SRU) that effectively captures feature and sample correlations for imputation. Experiment results on 25 benchmark datasets under three different missingness settings show the effectiveness of M-Impute by achieving 20 best and 4 second-best MAE scores on average.

Paper Structure

This paper contains 25 sections, 12 equations, 5 figures, 21 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of the M$^3$-Impute model. The tabular data with missing values is first modeled as a bipartite graph with our refined initialization unit, which incorporates the missingness information in node embedding initialization. The graph is then processed with a GNN to update node embeddings. After that, we apply our novel soft masking schemes on these node embeddings to further encode correlation and missingness information in the learning process, using our novel components of feature correlation unit (FCU) and sample correlation unit (SCU). Eventually, the missing value is predicted with an MLP on the weighted sum of the outputs from FCU and SCU.
  • Figure 2: SCU.
  • Figure 3: Model performance vs. missing ratios. MAE scores are offset by HyperImpute.
  • Figure 4: Pearson correlation coefficients of UCI datasets.
  • Figure 5: Pearson correlation coefficient of 17 additional datasets.