HouseTS: A Large-Scale, Multimodal Spatiotemporal U.S. Housing Dataset and Benchmark
Shengkun Wang, Yanshen Sun, Fanglan Chen, Linhan Wang, Naren Ramakrishnan, Chang-Tien Lu, Yinlin Chen
TL;DR
HouseTS addresses the need for a large-scale, open, multimodal spatiotemporal housing dataset designed for long-horizon forecasting. It introduces a ZIP-code panel covering $6{,}000$ ZIPs in $30$ metros from $2012$ to $2023$, aligning housing signals, POIs, census covariates, and time-stamped aerial imagery under a unified preprocessing protocol. The work defines standardized univariate and multivariate forecasting benchmarks across six window configurations and three horizons, evaluating 16 model families including classical, deep learning, and pretrained time-series foundations; results show strong performance of simple linear baselines when properly preprocessed, with foundation models offering robustness but not consistently exceeding traditional baselines. Additionally, HouseTS provides image-derived textual change annotations with disagreement-based reliability metadata to enable interpretable multimodal analyses and reliability-aware dataset usage, and it releases the dataset, code, and documentation for reproducible research in spatiotemporal learning.
Abstract
Accurate long-horizon house-price forecasting requires benchmarks that capture temporal dynamics together with time-varying local context. However, existing public resources remain fragmented: many datasets have limited spatial coverage, temporal depth, or multimodal alignment; the robustness of modern deep forecasters and time-series foundation models on housing data is not well characterized; and aerial imagery is rarely leveraged in a time-aware and interpretable manner at scale. To bridge these gaps, we present HouseTS (House Time Series), a multimodal spatiotemporal dataset for ZIP-code-level housing-market analysis, covering monthly signals from March 2012 to December 2023 across over 6,000 ZIP codes in 30 major U.S. metropolitan areas. HouseTS aligns monthly housing-market indicators, monthly POI dynamics, and annual census-based socioeconomic variables under a unified schema, and includes time-stamped annual aerial imagery. Building on HouseTS, we define standardized long-horizon forecasting tasks for univariate and multivariate prediction and benchmark 16 model families spanning statistical methods, classical machine learning, deep neural networks, and time-series foundation models in both zero-shot and fine-tuned modes. We also provide image-derived textual change annotations from multi-year aerial image sequences via a vision--language pipeline with LLM-as-judge and human verification to support scalable interpretability analyses. HouseTS is available on Kaggle, with code and documentation on GitHub.
