Table of Contents
Fetching ...

A Multi-Modal Deep Learning Based Approach for House Price Prediction

Md Hasebul Hasan, Md Abid Jahan, Mohammed Eunus Ali, Yuan-Fang Li, Timos Sellis

TL;DR

The paper tackles the challenge of house price prediction by integrating diverse data modalities from real estate listings. It introduces the Multi-Modal House Price Predictor (MHPP), which learns joint embeddings from geo-spatial context (GSNE), textual descriptions (SBERT), and house images (CLIP) alongside raw features, then feeds a downstream regressor to predict prices. Experimental results on a Melbourne dataset show that incorporating text and image embeddings with geo-spatial and raw features yields substantial accuracy gains, with the best setup achieving notable reductions in MAE and RMSE across several regression models. The work demonstrates the practical value of multi-modal representations for real estate analytics and provides a publicly available codebase and dataset for reproducibility.

Abstract

Accurate prediction of house price, a vital aspect of the residential real estate sector, is of substantial interest for a wide range of stakeholders. However, predicting house prices is a complex task due to the significant variability influenced by factors such as house features, location, neighborhood, and many others. Despite numerous attempts utilizing a wide array of algorithms, including recent deep learning techniques, to predict house prices accurately, existing approaches have fallen short of considering a wide range of factors such as textual and visual features. This paper addresses this gap by comprehensively incorporating attributes, such as features, textual descriptions, geo-spatial neighborhood, and house images, typically showcased in real estate listings in a house price prediction system. Specifically, we propose a multi-modal deep learning approach that leverages different types of data to learn more accurate representation of the house. In particular, we learn a joint embedding of raw house attributes, geo-spatial neighborhood, and most importantly from textual description and images representing the house; and finally use a downstream regression model to predict the house price from this jointly learned embedding vector. Our experimental results with a real-world dataset show that the text embedding of the house advertisement description and image embedding of the house pictures in addition to raw attributes and geo-spatial embedding, can significantly improve the house price prediction accuracy. The relevant source code and dataset are publicly accessible at the following URL: https://github.com/4P0N/mhpp

A Multi-Modal Deep Learning Based Approach for House Price Prediction

TL;DR

The paper tackles the challenge of house price prediction by integrating diverse data modalities from real estate listings. It introduces the Multi-Modal House Price Predictor (MHPP), which learns joint embeddings from geo-spatial context (GSNE), textual descriptions (SBERT), and house images (CLIP) alongside raw features, then feeds a downstream regressor to predict prices. Experimental results on a Melbourne dataset show that incorporating text and image embeddings with geo-spatial and raw features yields substantial accuracy gains, with the best setup achieving notable reductions in MAE and RMSE across several regression models. The work demonstrates the practical value of multi-modal representations for real estate analytics and provides a publicly available codebase and dataset for reproducibility.

Abstract

Accurate prediction of house price, a vital aspect of the residential real estate sector, is of substantial interest for a wide range of stakeholders. However, predicting house prices is a complex task due to the significant variability influenced by factors such as house features, location, neighborhood, and many others. Despite numerous attempts utilizing a wide array of algorithms, including recent deep learning techniques, to predict house prices accurately, existing approaches have fallen short of considering a wide range of factors such as textual and visual features. This paper addresses this gap by comprehensively incorporating attributes, such as features, textual descriptions, geo-spatial neighborhood, and house images, typically showcased in real estate listings in a house price prediction system. Specifically, we propose a multi-modal deep learning approach that leverages different types of data to learn more accurate representation of the house. In particular, we learn a joint embedding of raw house attributes, geo-spatial neighborhood, and most importantly from textual description and images representing the house; and finally use a downstream regression model to predict the house price from this jointly learned embedding vector. Our experimental results with a real-world dataset show that the text embedding of the house advertisement description and image embedding of the house pictures in addition to raw attributes and geo-spatial embedding, can significantly improve the house price prediction accuracy. The relevant source code and dataset are publicly accessible at the following URL: https://github.com/4P0N/mhpp
Paper Structure (22 sections, 10 equations, 7 figures, 7 tables)

This paper contains 22 sections, 10 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: A simple house flyer and its contents
  • Figure 2: MHPP joint embedding, incorporating all relevant house features.
  • Figure 3: GSNE architecture overview.
  • Figure 4: (a) A complete breakdown of a typical house flyer into different parts indicating our multi-dimensional dataset. (b) Different features of the house. (c) Geo-spatial network centered around the house where the edges are given with the nearby schools, shops and bus stations with their distance measured. (d) Textual description of the house in the advertisement describing the house. (e) The images of the house with both interior and exterior views.
  • Figure 5: SBERT architecture for deriving fixed length sentence embedding.
  • ...and 2 more figures