Uncertainty quantification in automated valuation models with spatially weighted conformal prediction
Anders Hjort, Gudmund Horn Hermansen, Johan Pensar, Jonathan P. Williams
TL;DR
The paper tackles calibrated uncertainty quantification for automated valuation models (AVMs) in real estate, addressing spatial heterogeneity that challenges standard conformal prediction (CP). It proposes and evaluates spatially weighted CP variants, including Mondrian-style and local weighting, across multiple non-conformity measures (Standard, Normalized 1/2, and conformalized quantile regression, CQR). Through simulations and an Oslo housing dataset, the study shows that spatial weighting yields more uniformly calibrated coverage across regions, with CQR often delivering the most efficient and robust intervals, while local calibration helps when non-conformity scores miss spatial structure. The findings have practical implications for risk management and decision-making in real estate markets, offering actionable guidance on which CP variants to deploy under spatially varying uncertainty. All mathematical notation in the CP framework is formalized, including $C_{1-\alpha}$, $\hat{q}_{1-\alpha}$, and weighting schemes such as $w_i=\exp(-d_{i,N+1}^2/\eta)$.
Abstract
Non-parametric machine learning models, such as random forests and gradient boosted trees, are frequently used to estimate house prices due to their predictive accuracy, but a main drawback of such methods is their limited ability to quantify prediction uncertainty. Conformal prediction (CP) is a model-agnostic framework for constructing confidence sets around predictions of machine learning models with minimal assumptions. However, due to the spatial dependencies observed in house prices, direct application of CP leads to confidence sets that are not calibrated everywhere, i.e., the confidence sets will be too large in certain geographical regions and too small in others. We survey various approaches to adjust the CP confidence set to account for this and demonstrate their performance on a data set from the housing market in Oslo, Norway. Our findings indicate that calibrating the confidence sets on a spatially weighted version of the non-conformity scores makes the coverage more consistently calibrated across geographical regions. We also perform a simulation study on synthetically generated sale prices to empirically explore the performance of CP on housing market data under idealized conditions with known data-generating mechanisms.
