Table of Contents
Fetching ...

Constructing Variables Using Classifiers as an Aid to Regression: An Empirical Assessment

Colin Troisemaine, Vincent Lemaire

TL;DR

The proposed enrichment method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds.

Abstract

This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds. Then classifiers are trained to predict whether the value to be regressed is less than or equal to each of these thresholds. The different outputs of the classifiers are then concatenated in the form of an additional vector of variables that enriches the initial vector of the regression problem. The implemented system can thus be considered as a generic pre-processing tool. We tested the proposed enrichment method with 5 types of regressors and evaluated it in 33 regression datasets. Our experimental results confirm the interest of the approach.

Constructing Variables Using Classifiers as an Aid to Regression: An Empirical Assessment

TL;DR

The proposed enrichment method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds.

Abstract

This paper proposes a method for the automatic creation of variables (in the case of regression) that complement the information contained in the initial input vector. The method works as a pre-processing step in which the continuous values of the variable to be regressed are discretized into a set of intervals which are then used to define value thresholds. Then classifiers are trained to predict whether the value to be regressed is less than or equal to each of these thresholds. The different outputs of the classifiers are then concatenated in the form of an additional vector of variables that enriches the initial vector of the regression problem. The implemented system can thus be considered as a generic pre-processing tool. We tested the proposed enrichment method with 5 types of regressors and evaluated it in 33 regression datasets. Our experimental results confirm the interest of the approach.
Paper Structure (11 sections, 5 figures, 2 tables)

This paper contains 11 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Example of thresholds
  • Figure 2: Data set extension.
  • Figure 3: Diagram of the general principle of the method.
  • Figure 4: $S$ (horizontal axis) versus RMSE (vertical axis). The dotted lines represent the initial performance of the regressor in blue for the test set and in orange for the training set. Solid lines represent the regressor's performance with the proposed method, using the same color code.
  • Figure 5: Critical diagram: on the left, the regressors without the proposed method, in the center, the regressors with and without the proposed method, and on the right, the regressors only with the proposed method (indicated by a '+').