ARED: Argentina Real Estate Dataset
Iván Belenky
TL;DR
ARED addresses the lack of multimodal, time-snapshot data for the Argentinian real estate market by introducing ARED0, a 44-day Jan–Feb 2024 snapshot. The dataset is multimodal, linking per-listing descriptive features to a variable-length set of $40 \times 40$ RGB images, and includes 26 property types with detailed geographic and temporal metadata, collected via automated scraping starting around $11^{\text{th}}$ January 2024. The initial analysis reveals market-wide time dependence and cohesive dynamics across property types, evidenced by stable Wasserstein distance $d_W$ between price distributions. The work lays a baseline resource for price-prediction research in a volatile market and outlines a roadmap for quarterly updates and historical data incorporation.
Abstract
The Argentinian real estate market presents a unique case study characterized by its unstable and rapidly shifting macroeconomic circumstances over the past decades. Despite the existence of a few datasets for price prediction, there is a lack of mixed modality datasets specifically focused on Argentina. In this paper, the first edition of ARED is introduced. A comprehensive real estate price prediction dataset series, designed for the Argentinian market. This edition contains information solely for Jan-Feb 2024. It was found that despite the short time range captured by this zeroth edition (44 days), time dependent phenomena has been occurring mostly on a market level (market as a whole). Nevertheless future editions of this dataset, will most likely contain historical data. Each listing in ARED comprises descriptive features, and variable-length sets of images.
