Table of Contents
Fetching ...

DTization: A New Method for Supervised Feature Scaling

Niful Islam

TL;DR

The paper tackles the problem that traditional feature scaling methods are unsupervised and fail to leverage the dependent variable. It proposes DTization, which uses a decision tree to rank feature importance and applies a multiplicative factor to the output of a robust scaler for each feature, with overall time and space complexity $O(n d)$. The authors evaluate DTization on ten datasets spanning classification and regression, comparing against min-max, standard, log, and robust scalers, and report notable gains on several tasks. The work demonstrates a practical, universal supervised feature-scaling approach and provides open-source code to enable replication and use in real-world pipelines.

Abstract

Artificial intelligence is currently a dominant force in shaping various aspects of the world. Machine learning is a sub-field in artificial intelligence. Feature scaling is one of the data pre-processing techniques that improves the performance of machine learning algorithms. The traditional feature scaling techniques are unsupervised where they do not have influence of the dependent variable in the scaling process. In this paper, we have presented a novel feature scaling technique named DTization that employs decision tree and robust scaler for supervised feature scaling. The proposed method utilizes decision tree to measure the feature importance and based on the importance, different features get scaled differently with the robust scaler algorithm. The proposed method has been extensively evaluated on ten classification and regression datasets on various evaluation matrices and the results show a noteworthy performance improvement compared to the traditional feature scaling methods.

DTization: A New Method for Supervised Feature Scaling

TL;DR

The paper tackles the problem that traditional feature scaling methods are unsupervised and fail to leverage the dependent variable. It proposes DTization, which uses a decision tree to rank feature importance and applies a multiplicative factor to the output of a robust scaler for each feature, with overall time and space complexity . The authors evaluate DTization on ten datasets spanning classification and regression, comparing against min-max, standard, log, and robust scalers, and report notable gains on several tasks. The work demonstrates a practical, universal supervised feature-scaling approach and provides open-source code to enable replication and use in real-world pipelines.

Abstract

Artificial intelligence is currently a dominant force in shaping various aspects of the world. Machine learning is a sub-field in artificial intelligence. Feature scaling is one of the data pre-processing techniques that improves the performance of machine learning algorithms. The traditional feature scaling techniques are unsupervised where they do not have influence of the dependent variable in the scaling process. In this paper, we have presented a novel feature scaling technique named DTization that employs decision tree and robust scaler for supervised feature scaling. The proposed method utilizes decision tree to measure the feature importance and based on the importance, different features get scaled differently with the robust scaler algorithm. The proposed method has been extensively evaluated on ten classification and regression datasets on various evaluation matrices and the results show a noteworthy performance improvement compared to the traditional feature scaling methods.
Paper Structure (12 sections, 15 equations, 1 figure, 5 tables, 2 algorithms)