Multimodal Machine Learning for Real Estate Appraisal: A Comprehensive Survey
Chenya Huang, Zhidong Li, Fang Chen, Bin Liang
TL;DR
Real estate valuation has shifted from manual appraisal to automated valuation models, and this paper surveys how multimodal ML—integrating attributes, market data, text, images, and GIS—can improve predictive accuracy and interpretability. It provides a taxonomy of modalities and a framework around two research questions: model performance and modality fusion, detailing fusion strategies (early, late, hybrid) and evaluation baselines using $R^2$, MAE, RMSE. The review highlights that multimodal models consistently outperform single-modality approaches, with ablation studies confirming complementary effects among modalities, and notes growing use of Transformer-based and GNN architectures. The findings underscore potential for scalable, data-rich mass valuation while calling for clearer modality attribution and adoption of up-to-date multimodal technologies.
Abstract
Real estate appraisal has undergone a significant transition from manual to automated valuation and is entering a new phase of evolution. Leveraging comprehensive attention to various data sources, a novel approach to automated valuation, multimodal machine learning, has taken shape. This approach integrates multimodal data to deeply explore the diverse factors influencing housing prices. Furthermore, multimodal machine learning significantly outperforms single-modality or fewer-modality approaches in terms of prediction accuracy, with enhanced interpretability. However, systematic and comprehensive survey work on the application in the real estate domain is still lacking. In this survey, we aim to bridge this gap by reviewing the research efforts. We begin by reviewing the background of real estate appraisal and propose two research questions from the perspecve of performance and fusion aimed at improving the accuracy of appraisal results. Subsequently, we explain the concept of multimodal machine learning and provide a comprehensive classification and definition of modalities used in real estate appraisal for the first time. To ensure clarity, we explore works related to data and techniques, along with their evaluation methods, under the framework of these two research questions. Furthermore, specific application domains are summarized. Finally, we present insights into future research directions including multimodal complementarity, technology and modality contribution.
