A Visualization for Comparative Analysis of Regression Models

Nassime Mountasir; Baptiste Lafabregue; Bruno Albert; Nicolas Lachiche

A Visualization for Comparative Analysis of Regression Models

Nassime Mountasir, Baptiste Lafabregue, Bruno Albert, Nicolas Lachiche

Abstract

As regression is a widely studied problem, many methods have been proposed to solve it, each of them often requiring setting different hyper-parameters. Therefore, selecting the proper method for a given application may be very difficult and relies on comparing their performances. Performance is usually measured using various metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), or R-squared (R${}^2$). These metrics provide a numerical summary of predictive accuracy by quantifying the difference between predicted and actual values. However, while these metrics are widely used in the literature for summarizing model performance and useful to distinguish between models performing poorly and well, they often aggregate too much information. This article addresses these limitations by introducing a novel visualization approach that highlights key aspects of regression model performance. The proposed method builds upon three main contributions: (1) considering the residuals in a 2D space, which allows for simultaneous evaluation of errors from two models, (2) leveraging the Mahalanobis distance to account for correlations and differences in scale within the data, and (3) employing a colormap to visualize the percentile-based distribution of errors, making it easier to identify dense regions and outliers. By graphically representing the distribution of errors and their correlations, this approach provides a more detailed and comprehensive view of model performance, enabling users to uncover patterns that traditional aggregate metrics may obscure. The proposed visualization method facilitates a deeper understanding of regression model performance differences and error distributions, enhancing the evaluation and comparison process.

A Visualization for Comparative Analysis of Regression Models

Abstract

). These metrics provide a numerical summary of predictive accuracy by quantifying the difference between predicted and actual values. However, while these metrics are widely used in the literature for summarizing model performance and useful to distinguish between models performing poorly and well, they often aggregate too much information. This article addresses these limitations by introducing a novel visualization approach that highlights key aspects of regression model performance. The proposed method builds upon three main contributions: (1) considering the residuals in a 2D space, which allows for simultaneous evaluation of errors from two models, (2) leveraging the Mahalanobis distance to account for correlations and differences in scale within the data, and (3) employing a colormap to visualize the percentile-based distribution of errors, making it easier to identify dense regions and outliers. By graphically representing the distribution of errors and their correlations, this approach provides a more detailed and comprehensive view of model performance, enabling users to uncover patterns that traditional aggregate metrics may obscure. The proposed visualization method facilitates a deeper understanding of regression model performance differences and error distributions, enhancing the evaluation and comparison process.

Paper Structure (18 sections, 7 figures, 1 table)

This paper contains 18 sections, 7 figures, 1 table.

Introduction
Limits of usual metrics and visualizations
Metrics
Moderate vs. Extreme errors
Under and over estimations
Similar errors on different individuals
Visualizations
State of the art
Graphical comparison of the errors of regression models
1D Comparison
2D Error Space
Density and proximity with median
Distances
Case Study
Experimental Setup
...and 3 more sections

Figures (7)

Figure 1: Overview of the three main limitations of aggregate error metrics
Figure 2: Comparison of the predictions and the real values for Dataset A
Figure 3: Comparison of error distributions (left) and predicted vs. real values (right).
Figure 4: Comparison of the errors of Model A1 and Model A10
Figure 5: Representation of the proximity of points with different methods
...and 2 more figures

A Visualization for Comparative Analysis of Regression Models

Abstract

A Visualization for Comparative Analysis of Regression Models

Authors

Abstract

Table of Contents

Figures (7)