Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

Oliver Urs Lenz; Daniel Peralta; Chris Cornelis

Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

Oliver Urs Lenz, Daniel Peralta, Chris Cornelis

TL;DR

Polar encoding introduces a simple, modular baseline for handling missing values in classification by representing missingness without imputation. It unifies categorical and [0,1]-valued attributes as barycentric coordinates, yielding a two-feature per attribute representation that preserves missing information and is compatible with any classifier. Empirical results across 20 real datasets show polar encoding often outperforms state-of-the-art imputation methods (MICE, MIDAS) and competes with mean/mode imputation with missing-indicators, while maintaining interpretability through its connection to one-hot encoding and MIA for trees. The approach is practical, easy to implement, and provides a principled way to let data speak for itself about missingness, with potential extensions to differently scaled numerical attributes.

Abstract

We propose polar encoding, a representation of categorical and numerical $[0,1]$-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and $[0,1]$-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators.

Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

TL;DR

Abstract

We propose polar encoding, a representation of categorical and numerical

-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and

-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators.

Paper Structure (16 sections, 4 equations, 7 figures, 4 tables)

This paper contains 16 sections, 4 equations, 7 figures, 4 tables.

Introduction
Polar encoding as a good baseline approach
Polar encoding and distance-based classifiers
Boscovich distance
Euclidean distance
Polar encoding and decision tree classifiers
Polar encoding as representation of barycentric attributes
Numerical and categorical attributes
Barycentric attributes
Barycentric attributes as fuzzified categorical attributes
$[0, 1]$-valued attributes as barycentric attributes
Representing missing values
Experimental evaluation
Setup
Results
...and 1 more sections

Figures (7)

Figure 1: Illustrative example of a $[0, 1]$-valued attribute for height with missing value, with missing-indicator and polar encoding.
Figure 2: Minkowski $p$-norm unit circles for various values of $p$.
Figure 3: Illustrative example of equivalent splits on a polar-encoded attribute, with missing values on either side.
Figure 4: Example illustrating the correspondence between crisp partitions and categorical attributes of a dataset. Rows correspond to the records, columns to the partition classes and categories. The values 1 and 0 indicate membership and non-membership, respectively.
Figure 5: Example of a ternary plot: distribution of GDP over economic sectors of countries and territories cia22gdp.
...and 2 more figures

Theorems & Definitions (6)

Definition 1
Remark 1
Definition 2
Remark 2
Definition 3
Remark 3

Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

TL;DR

Abstract

Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (6)