Table of Contents
Fetching ...

HouseTS: A Large-Scale, Multimodal Spatiotemporal U.S. Housing Dataset and Benchmark

Shengkun Wang, Yanshen Sun, Fanglan Chen, Linhan Wang, Naren Ramakrishnan, Chang-Tien Lu, Yinlin Chen

TL;DR

HouseTS addresses the need for a large-scale, open, multimodal spatiotemporal housing dataset designed for long-horizon forecasting. It introduces a ZIP-code panel covering $6{,}000$ ZIPs in $30$ metros from $2012$ to $2023$, aligning housing signals, POIs, census covariates, and time-stamped aerial imagery under a unified preprocessing protocol. The work defines standardized univariate and multivariate forecasting benchmarks across six window configurations and three horizons, evaluating 16 model families including classical, deep learning, and pretrained time-series foundations; results show strong performance of simple linear baselines when properly preprocessed, with foundation models offering robustness but not consistently exceeding traditional baselines. Additionally, HouseTS provides image-derived textual change annotations with disagreement-based reliability metadata to enable interpretable multimodal analyses and reliability-aware dataset usage, and it releases the dataset, code, and documentation for reproducible research in spatiotemporal learning.

Abstract

Accurate long-horizon house-price forecasting requires benchmarks that capture temporal dynamics together with time-varying local context. However, existing public resources remain fragmented: many datasets have limited spatial coverage, temporal depth, or multimodal alignment; the robustness of modern deep forecasters and time-series foundation models on housing data is not well characterized; and aerial imagery is rarely leveraged in a time-aware and interpretable manner at scale. To bridge these gaps, we present HouseTS (House Time Series), a multimodal spatiotemporal dataset for ZIP-code-level housing-market analysis, covering monthly signals from March 2012 to December 2023 across over 6,000 ZIP codes in 30 major U.S. metropolitan areas. HouseTS aligns monthly housing-market indicators, monthly POI dynamics, and annual census-based socioeconomic variables under a unified schema, and includes time-stamped annual aerial imagery. Building on HouseTS, we define standardized long-horizon forecasting tasks for univariate and multivariate prediction and benchmark 16 model families spanning statistical methods, classical machine learning, deep neural networks, and time-series foundation models in both zero-shot and fine-tuned modes. We also provide image-derived textual change annotations from multi-year aerial image sequences via a vision--language pipeline with LLM-as-judge and human verification to support scalable interpretability analyses. HouseTS is available on Kaggle, with code and documentation on GitHub.

HouseTS: A Large-Scale, Multimodal Spatiotemporal U.S. Housing Dataset and Benchmark

TL;DR

HouseTS addresses the need for a large-scale, open, multimodal spatiotemporal housing dataset designed for long-horizon forecasting. It introduces a ZIP-code panel covering ZIPs in metros from to , aligning housing signals, POIs, census covariates, and time-stamped aerial imagery under a unified preprocessing protocol. The work defines standardized univariate and multivariate forecasting benchmarks across six window configurations and three horizons, evaluating 16 model families including classical, deep learning, and pretrained time-series foundations; results show strong performance of simple linear baselines when properly preprocessed, with foundation models offering robustness but not consistently exceeding traditional baselines. Additionally, HouseTS provides image-derived textual change annotations with disagreement-based reliability metadata to enable interpretable multimodal analyses and reliability-aware dataset usage, and it releases the dataset, code, and documentation for reproducible research in spatiotemporal learning.

Abstract

Accurate long-horizon house-price forecasting requires benchmarks that capture temporal dynamics together with time-varying local context. However, existing public resources remain fragmented: many datasets have limited spatial coverage, temporal depth, or multimodal alignment; the robustness of modern deep forecasters and time-series foundation models on housing data is not well characterized; and aerial imagery is rarely leveraged in a time-aware and interpretable manner at scale. To bridge these gaps, we present HouseTS (House Time Series), a multimodal spatiotemporal dataset for ZIP-code-level housing-market analysis, covering monthly signals from March 2012 to December 2023 across over 6,000 ZIP codes in 30 major U.S. metropolitan areas. HouseTS aligns monthly housing-market indicators, monthly POI dynamics, and annual census-based socioeconomic variables under a unified schema, and includes time-stamped annual aerial imagery. Building on HouseTS, we define standardized long-horizon forecasting tasks for univariate and multivariate prediction and benchmark 16 model families spanning statistical methods, classical machine learning, deep neural networks, and time-series foundation models in both zero-shot and fine-tuned modes. We also provide image-derived textual change annotations from multi-year aerial image sequences via a vision--language pipeline with LLM-as-judge and human verification to support scalable interpretability analyses. HouseTS is available on Kaggle, with code and documentation on GitHub.

Paper Structure

This paper contains 15 sections, 4 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: HouseTS overview: 30 U.S. metropolitan areas and aligned ZIP-level modalities.
  • Figure 2: Distribution of house prices before (left) and after log transformation (right).
  • Figure 3: VLM-based semantic annotation pipeline.
  • Figure 4: Model-specific distributions of self-reported uncertainty scores (1--5).
  • Figure 5: Deviation of each annotator relative to the multi-model consensus (median), measured as mean absolute difference over discrete score fields. Lower deviation indicates closer alignment to consensus.
  • ...and 4 more figures