Table of Contents
Fetching ...

On the Performance of LLMs for Real Estate Appraisal

Margot Geerts, Manon Reusens, Bart Baesens, Seppe vanden Broucke, Jochen De Weerdt

TL;DR

The paper addresses information asymmetry in real estate valuation and investigates whether Large Language Models can democratize access to price insights through optimized In-Context Learning (ICL). It systematically evaluates multiple LLMs across four international datasets using diverse prompting strategies, market-context augmentation, and comparisons to $MAPE$-minimizing baselines (kNN) and SOTA models (LGBM), with Conformal Prediction used for intervals. Key findings show that carefully chosen in-context examples based on geographic proximity and hedonic similarity, along with market reports in dynamic markets, yield competitive price estimates and explanations that align with SHAP-based feature importance, while LLMs struggle with spatial/temporal reasoning and interval calibration. The study offers practical guidelines for deploying LLM-based valuation tools and highlights future directions such as retrieval-augmented generation and advanced temporal-generalization to improve trustworthiness and reliability.

Abstract

The real estate market is vital to global economies but suffers from significant information asymmetry. This study examines how Large Language Models (LLMs) can democratize access to real estate insights by generating competitive and interpretable house price estimates through optimized In-Context Learning (ICL) strategies. We systematically evaluate leading LLMs on diverse international housing datasets, comparing zero-shot, few-shot, market report-enhanced, and hybrid prompting techniques. Our results show that LLMs effectively leverage hedonic variables, such as property size and amenities, to produce meaningful estimates. While traditional machine learning models remain strong for pure predictive accuracy, LLMs offer a more accessible, interactive and interpretable alternative. Although self-explanations require cautious interpretation, we find that LLMs explain their predictions in agreement with state-of-the-art models, confirming their trustworthiness. Carefully selected in-context examples based on feature similarity and geographic proximity, significantly enhance LLM performance, yet LLMs struggle with overconfidence in price intervals and limited spatial reasoning. We offer practical guidance for structured prediction tasks through prompt optimization. Our findings highlight LLMs' potential to improve transparency in real estate appraisal and provide actionable insights for stakeholders.

On the Performance of LLMs for Real Estate Appraisal

TL;DR

The paper addresses information asymmetry in real estate valuation and investigates whether Large Language Models can democratize access to price insights through optimized In-Context Learning (ICL). It systematically evaluates multiple LLMs across four international datasets using diverse prompting strategies, market-context augmentation, and comparisons to -minimizing baselines (kNN) and SOTA models (LGBM), with Conformal Prediction used for intervals. Key findings show that carefully chosen in-context examples based on geographic proximity and hedonic similarity, along with market reports in dynamic markets, yield competitive price estimates and explanations that align with SHAP-based feature importance, while LLMs struggle with spatial/temporal reasoning and interval calibration. The study offers practical guidelines for deploying LLM-based valuation tools and highlights future directions such as retrieval-augmented generation and advanced temporal-generalization to improve trustworthiness and reliability.

Abstract

The real estate market is vital to global economies but suffers from significant information asymmetry. This study examines how Large Language Models (LLMs) can democratize access to real estate insights by generating competitive and interpretable house price estimates through optimized In-Context Learning (ICL) strategies. We systematically evaluate leading LLMs on diverse international housing datasets, comparing zero-shot, few-shot, market report-enhanced, and hybrid prompting techniques. Our results show that LLMs effectively leverage hedonic variables, such as property size and amenities, to produce meaningful estimates. While traditional machine learning models remain strong for pure predictive accuracy, LLMs offer a more accessible, interactive and interpretable alternative. Although self-explanations require cautious interpretation, we find that LLMs explain their predictions in agreement with state-of-the-art models, confirming their trustworthiness. Carefully selected in-context examples based on feature similarity and geographic proximity, significantly enhance LLM performance, yet LLMs struggle with overconfidence in price intervals and limited spatial reasoning. We offer practical guidance for structured prediction tasks through prompt optimization. Our findings highlight LLMs' potential to improve transparency in real estate appraisal and provide actionable insights for stakeholders.

Paper Structure

This paper contains 22 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of the LLM prompting methodology for house price prediction. Step 1: The model receives a structured prompt containing the task definition, an optional market report, optional ICL examples, and details of the target property-forming the basis for prompt optimization. It then predicts the property price. Step 2: The model generates a 90% prediction interval. Step 3: The model identifies the five most important features. This approach enables price estimation, uncertainty quantification, and interpretability in real estate appraisal.
  • Figure 2: Including 10 mixed examples (geographic and hedonic similarity) provides the best results overall and zero-shot prompting the worst. GPT-4o-mini generally outperforms the other models. This figure shows the results for the twelve different prompting strategies across all four datasets.
  • Figure 3: LLMs generally align with LGBM on the importance of hedonic variables. Comparison of top five features between GPT-4o-mini and LGBM.
  • Figure 4: LLMs generally align with LGBM on the importance of hedonic variables. Comparison of top five features between Llama3.1:70b and LGBM.