ProbSAINT: Probabilistic Tabular Regression for Used Car Pricing

Kiran Madhusudhanan; Gunnar Behrens; Maximilian Stubbemann; Lars Schmidt-Thieme

ProbSAINT: Probabilistic Tabular Regression for Used Car Pricing

Kiran Madhusudhanan, Gunnar Behrens, Maximilian Stubbemann, Lars Schmidt-Thieme

TL;DR

ProbSAINT extends SAINT to probabilistic tabular regression for used-car pricing by training with a distributional loss to output a predictive distribution $\hat{p}(y|x)$ characterized by $(\mu,\sigma)$. It integrates a per-feature encoding, self-attention, and inter-sample attention to leverage large-scale tabular data, achieving state-of-the-art uncertainty quantification on a real-world 2M-record dataset while maintaining competitive point predictions. The approach is validated against baselines including MC-Dropout, NGBoost, CatBoostUn, and ProbMLP, showing lower Negative Log Likelihood and calibrated uncertainty across confidence levels, plus the ability to perform probabilistic dynamic forecasting over varying offer durations. The work also demonstrates deployment considerations and outlines future directions such as pre-training to exploit additional data, reinforcing ProbSAINT’s potential for trustworthy, automated pricing in industrial settings.

Abstract

Used car pricing is a critical aspect of the automotive industry, influenced by many economic factors and market dynamics. With the recent surge in online marketplaces and increased demand for used cars, accurate pricing would benefit both buyers and sellers by ensuring fair transactions. However, the transition towards automated pricing algorithms using machine learning necessitates the comprehension of model uncertainties, specifically the ability to flag predictions that the model is unsure about. Although recent literature proposes the use of boosting algorithms or nearest neighbor-based approaches for swift and precise price predictions, encapsulating model uncertainties with such algorithms presents a complex challenge. We introduce ProbSAINT, a model that offers a principled approach for uncertainty quantification of its price predictions, along with accurate point predictions that are comparable to state-of-the-art boosting techniques. Furthermore, acknowledging that the business prefers pricing used cars based on the number of days the vehicle was listed for sale, we show how ProbSAINT can be used as a dynamic forecasting model for predicting price probabilities for different expected offer duration. Our experiments further indicate that ProbSAINT is especially accurate on instances where it is highly certain. This proves the applicability of its probabilistic predictions in real-world scenarios where trustworthiness is crucial.

ProbSAINT: Probabilistic Tabular Regression for Used Car Pricing

TL;DR

ProbSAINT extends SAINT to probabilistic tabular regression for used-car pricing by training with a distributional loss to output a predictive distribution

characterized by

. It integrates a per-feature encoding, self-attention, and inter-sample attention to leverage large-scale tabular data, achieving state-of-the-art uncertainty quantification on a real-world 2M-record dataset while maintaining competitive point predictions. The approach is validated against baselines including MC-Dropout, NGBoost, CatBoostUn, and ProbMLP, showing lower Negative Log Likelihood and calibrated uncertainty across confidence levels, plus the ability to perform probabilistic dynamic forecasting over varying offer durations. The work also demonstrates deployment considerations and outlines future directions such as pre-training to exploit additional data, reinforcing ProbSAINT’s potential for trustworthy, automated pricing in industrial settings.

Abstract

Paper Structure (24 sections, 13 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 13 equations, 4 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Problem Definition
Methodology: ProbSAINT
Encoding
Self Attention and Inter-Sample Attention
Distributional Output
Experimental Setting
Dataset
Data Preprocessing
Training and Evaluation
Probabilistic Price Prediction
Baselines.
Analysis.
Probabilistic Prediction Quality at Multiple Confidence Levels
...and 9 more sections

Figures (4)

Figure 1: Architecture of proposed ProbSAINT model. Every block contains layer normalization following each attention and feed-forward layer.
Figure 2: Qualitative comparison of ProbSAINT with that of the second best CatBoostUn baseline. The x-axis denotes the confidence score and y-axis indicates the corresponding MAPE error.
Figure 3: Qualitative analysis of ProbSAINT and CatBoostUn baseline. X-axis denotes the indices of different vehicles order in the decreasing order of their true selling price and Y-axis denotes the price.
Figure 4: Probabilistic prediction of price with respect to change in offer durations. The X-axis denotes the expected offer durations for a used-car, and the Y-axis denotes the normalized price prediction from the ProbSAINT for varying offer durations. The '+' data point denotes the "true selling price" for the "true offer durations", and the 'o' denotes predictions. Comparing three different model instances, the ProbSAINT learns varying market dynamics for the different models.

ProbSAINT: Probabilistic Tabular Regression for Used Car Pricing

TL;DR

Abstract

ProbSAINT: Probabilistic Tabular Regression for Used Car Pricing

Authors

TL;DR

Abstract

Table of Contents

Figures (4)