Statistical inference of probabilistic origin-destination demand using day-to-day traffic data
Wei Ma, Zhen Qian
TL;DR
This paper addresses uncertainty in origin-destination (O-D) demand and network flow by estimating a probabilistic O-D demand distribution $(Q\sim\mathcal{N}(q,\Sigma_q))$ within a generalized statistical traffic assignment framework (GESTA). It introduces an iterative generalized least squares (IGLS) procedure that alternates between estimating the O-D mean $q$ and its covariance $\Sigma_q$, leveraging day-to-day traffic counts and a sparse covariance via Lasso; the variance of link flows is decomposed into O-D demand variation, day-to-day route-choice variation, and measurement error. Theoretical results establish observability considerations and an ${\cal O}(1/n)$ risk bound for the mean estimator, showing the probabilistic ODE mean is no worse than deterministic ODE with sufficient data, while enabling inference on the covariance structure. Empirical tests on two small networks and two large-scale networks (including real-world DC and SR-41 cases) demonstrate fast convergence, accurate recovery of mean and covariance, and practical computational feasibility, while highlighting Lasso-induced bias and data requirements as essential trade-offs for interpretability and efficiency.
Abstract
Recent transportation network studies on uncertainty and reliability call for modeling the probabilistic O-D demand and probabilistic network flow. Making the best use of day-to-day traffic data collected over many years, this paper develops a novel theoretical framework for estimating the mean and variance/covariance matrix of O-D demand considering the day-to-day variation induced by travelers' independent route choices. It also estimates the probability distributions of link/path flow and their travel cost where the variance stems from three sources, O-D demand, route choice and unknown errors. The framework estimates O-D demand mean and variance/covariance matrix iteratively, also known as iterative generalized least squares (IGLS) in statistics. Lasso regularization is employed to obtain sparse covariance matrix for better interpretation and computational efficiency. Though the probabilistic O-D estimation (ODE) works with a much larger solution space than the deterministic ODE, we show that its estimator for O-D demand mean is no worse than the best possible estimator by an error that reduces with the increase in sample size. The probabilistic ODE is examined on two small networks and two real-world large-scale networks. The solution converges quickly under the IGLS framework. In all those experiments, the results of the probabilistic ODE are compelling, satisfactory and computationally plausible. Lasso regularization on the covariance matrix estimation leans to underestimate most of variance/covariance entries. A proper Lasso penalty ensures a good trade-off between bias and variance of the estimation.
