Deep hybrid model with satellite imagery: how to combine demand modeling and computer vision for behavior analysis?
Qingyi Wang, Shenhao Wang, Yunhan Zheng, Hongzhou Lin, Xiaohu Zhang, Jinhua Zhao, Joan Walker
TL;DR
The paper addresses the limitation of classical travel demand models that rely on low-dimensional numeric data by proposing a deep hybrid model (DHM) that fuses sociodemographics with high-dimensional satellite imagery through a mixing operator into a latent space. A supervised autoencoder–based mixing operator (SAE) and a simple generalized linear behavioral predictor map latent representations to travel outcomes, enabling both aggregate and disaggregate predictions as well as economic interpretation. Empirically, the approach using Chicago data shows DHMs outperform traditional models and pure deep learning, reveal a spatially meaningful latent structure, and allow generation of realistic satellite imagery with derived economic measures such as market shares, welfare, and substitution patterns. This framework demonstrates a practical path to leverage imagery alongside numeric data for travel behavior analysis, welfare estimation, and imagery-driven storytelling, while highlighting computational challenges and avenues for future research.
Abstract
Classical demand modeling analyzes travel behavior using only low-dimensional numeric data (i.e. sociodemographics and travel attributes) but not high-dimensional urban imagery. However, travel behavior depends on the factors represented by both numeric data and urban imagery, thus necessitating a synergetic framework to combine them. This study creates a theoretical framework of deep hybrid models with a crossing structure consisting of a mixing operator and a behavioral predictor, thus integrating the numeric and imagery data into a latent space. Empirically, this framework is applied to analyze travel mode choice using the MyDailyTravel Survey from Chicago as the numeric inputs and the satellite images as the imagery inputs. We found that deep hybrid models outperform both the traditional demand models and the recent deep learning in predicting the aggregate and disaggregate travel behavior with our supervision-as-mixing design. The latent space in deep hybrid models can be interpreted, because it reveals meaningful spatial and social patterns. The deep hybrid models can also generate new urban images that do not exist in reality and interpret them with economic theory, such as computing substitution patterns and social welfare changes. Overall, the deep hybrid models demonstrate the complementarity between the low-dimensional numeric and high-dimensional imagery data and between the traditional demand modeling and recent deep learning. It generalizes the latent classes and variables in classical hybrid demand models to a latent space, and leverages the computational power of deep learning for imagery while retaining the economic interpretability on the microeconomics foundation.
