Table of Contents
Fetching ...

Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

Oliver Urs Lenz, Daniel Peralta, Chris Cornelis

TL;DR

Polar encoding introduces a simple, modular baseline for handling missing values in classification by representing missingness without imputation. It unifies categorical and [0,1]-valued attributes as barycentric coordinates, yielding a two-feature per attribute representation that preserves missing information and is compatible with any classifier. Empirical results across 20 real datasets show polar encoding often outperforms state-of-the-art imputation methods (MICE, MIDAS) and competes with mean/mode imputation with missing-indicators, while maintaining interpretability through its connection to one-hot encoding and MIA for trees. The approach is practical, easy to implement, and provides a principled way to let data speak for itself about missingness, with potential extensions to differently scaled numerical attributes.

Abstract

We propose polar encoding, a representation of categorical and numerical $[0,1]$-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and $[0,1]$-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators.

Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

TL;DR

Polar encoding introduces a simple, modular baseline for handling missing values in classification by representing missingness without imputation. It unifies categorical and [0,1]-valued attributes as barycentric coordinates, yielding a two-feature per attribute representation that preserves missing information and is compatible with any classifier. Empirical results across 20 real datasets show polar encoding often outperforms state-of-the-art imputation methods (MICE, MIDAS) and competes with mean/mode imputation with missing-indicators, while maintaining interpretability through its connection to one-hot encoding and MIA for trees. The approach is practical, easy to implement, and provides a principled way to let data speak for itself about missingness, with potential extensions to differently scaled numerical attributes.

Abstract

We propose polar encoding, a representation of categorical and numerical -valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and -valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators.
Paper Structure (16 sections, 4 equations, 7 figures, 4 tables)

This paper contains 16 sections, 4 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Illustrative example of a $[0, 1]$-valued attribute for height with missing value, with missing-indicator and polar encoding.
  • Figure 2: Minkowski $p$-norm unit circles for various values of $p$.
  • Figure 3: Illustrative example of equivalent splits on a polar-encoded attribute, with missing values on either side.
  • Figure 4: Example illustrating the correspondence between crisp partitions and categorical attributes of a dataset. Rows correspond to the records, columns to the partition classes and categories. The values 1 and 0 indicate membership and non-membership, respectively.
  • Figure 5: Example of a ternary plot: distribution of GDP over economic sectors of countries and territories cia22gdp.
  • ...and 2 more figures

Theorems & Definitions (6)

  • Definition 1
  • Remark 1
  • Definition 2
  • Remark 2
  • Definition 3
  • Remark 3