On the Performance of LLMs for Real Estate Appraisal
Margot Geerts, Manon Reusens, Bart Baesens, Seppe vanden Broucke, Jochen De Weerdt
TL;DR
The paper addresses information asymmetry in real estate valuation and investigates whether Large Language Models can democratize access to price insights through optimized In-Context Learning (ICL). It systematically evaluates multiple LLMs across four international datasets using diverse prompting strategies, market-context augmentation, and comparisons to $MAPE$-minimizing baselines (kNN) and SOTA models (LGBM), with Conformal Prediction used for intervals. Key findings show that carefully chosen in-context examples based on geographic proximity and hedonic similarity, along with market reports in dynamic markets, yield competitive price estimates and explanations that align with SHAP-based feature importance, while LLMs struggle with spatial/temporal reasoning and interval calibration. The study offers practical guidelines for deploying LLM-based valuation tools and highlights future directions such as retrieval-augmented generation and advanced temporal-generalization to improve trustworthiness and reliability.
Abstract
The real estate market is vital to global economies but suffers from significant information asymmetry. This study examines how Large Language Models (LLMs) can democratize access to real estate insights by generating competitive and interpretable house price estimates through optimized In-Context Learning (ICL) strategies. We systematically evaluate leading LLMs on diverse international housing datasets, comparing zero-shot, few-shot, market report-enhanced, and hybrid prompting techniques. Our results show that LLMs effectively leverage hedonic variables, such as property size and amenities, to produce meaningful estimates. While traditional machine learning models remain strong for pure predictive accuracy, LLMs offer a more accessible, interactive and interpretable alternative. Although self-explanations require cautious interpretation, we find that LLMs explain their predictions in agreement with state-of-the-art models, confirming their trustworthiness. Carefully selected in-context examples based on feature similarity and geographic proximity, significantly enhance LLM performance, yet LLMs struggle with overconfidence in price intervals and limited spatial reasoning. We offer practical guidance for structured prediction tasks through prompt optimization. Our findings highlight LLMs' potential to improve transparency in real estate appraisal and provide actionable insights for stakeholders.
