Table of Contents
Fetching ...

ARED: Argentina Real Estate Dataset

Iván Belenky

TL;DR

ARED addresses the lack of multimodal, time-snapshot data for the Argentinian real estate market by introducing ARED0, a 44-day Jan–Feb 2024 snapshot. The dataset is multimodal, linking per-listing descriptive features to a variable-length set of $40 \times 40$ RGB images, and includes 26 property types with detailed geographic and temporal metadata, collected via automated scraping starting around $11^{\text{th}}$ January 2024. The initial analysis reveals market-wide time dependence and cohesive dynamics across property types, evidenced by stable Wasserstein distance $d_W$ between price distributions. The work lays a baseline resource for price-prediction research in a volatile market and outlines a roadmap for quarterly updates and historical data incorporation.

Abstract

The Argentinian real estate market presents a unique case study characterized by its unstable and rapidly shifting macroeconomic circumstances over the past decades. Despite the existence of a few datasets for price prediction, there is a lack of mixed modality datasets specifically focused on Argentina. In this paper, the first edition of ARED is introduced. A comprehensive real estate price prediction dataset series, designed for the Argentinian market. This edition contains information solely for Jan-Feb 2024. It was found that despite the short time range captured by this zeroth edition (44 days), time dependent phenomena has been occurring mostly on a market level (market as a whole). Nevertheless future editions of this dataset, will most likely contain historical data. Each listing in ARED comprises descriptive features, and variable-length sets of images.

ARED: Argentina Real Estate Dataset

TL;DR

ARED addresses the lack of multimodal, time-snapshot data for the Argentinian real estate market by introducing ARED0, a 44-day Jan–Feb 2024 snapshot. The dataset is multimodal, linking per-listing descriptive features to a variable-length set of RGB images, and includes 26 property types with detailed geographic and temporal metadata, collected via automated scraping starting around January 2024. The initial analysis reveals market-wide time dependence and cohesive dynamics across property types, evidenced by stable Wasserstein distance between price distributions. The work lays a baseline resource for price-prediction research in a volatile market and outlines a roadmap for quarterly updates and historical data incorporation.

Abstract

The Argentinian real estate market presents a unique case study characterized by its unstable and rapidly shifting macroeconomic circumstances over the past decades. Despite the existence of a few datasets for price prediction, there is a lack of mixed modality datasets specifically focused on Argentina. In this paper, the first edition of ARED is introduced. A comprehensive real estate price prediction dataset series, designed for the Argentinian market. This edition contains information solely for Jan-Feb 2024. It was found that despite the short time range captured by this zeroth edition (44 days), time dependent phenomena has been occurring mostly on a market level (market as a whole). Nevertheless future editions of this dataset, will most likely contain historical data. Each listing in ARED comprises descriptive features, and variable-length sets of images.
Paper Structure (4 sections, 6 figures, 1 table)

This paper contains 4 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Price per square meter for houses and apartments across Argentina's Territory, 25th-75th quantile ranges and CPI adjustment median values.
  • Figure 2: Price for houses and apartments across Argentina's Territory, 25th-75th quantile ranges and CPI adjustment median values. Discount rates are overlapped on the timeseries.
  • Figure 3: Relative statistics between property type groups: houses and apartments & 1 room apartments and $>$1 room apartments.
  • Figure 4: Wasserstein distance evolution between house and apartments prices distributions. Both houses and apartments data were constructed taking into account all of their respective subcategories such as House Duplex, Apartment Loft, etc.
  • Figure 5: Visual ARED0 range relative to the past historical data.
  • ...and 1 more figures