UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images

Kaizhen Tan; Fan Zhang

UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images

Kaizhen Tan, Fan Zhang

Abstract

Sidewalk width is an important indicator of pedestrian accessibility, comfort, and network quality, yet large-scale width data remain scarce in most cities. Existing approaches typically rely on costly field surveys, high-resolution overhead imagery, or simplified geometric assumptions that limit scalability or introduce systematic error. To address this gap, we present UrbanVGGT, a measurement pipeline for estimating metric sidewalk width from a single street-view image. The method combines semantic segmentation, feed-forward 3D reconstruction, adaptive ground-plane fitting, camera-height-based scale calibration, and directional width measurement on the recovered plane. On a ground-truth benchmark from Washington, D.C., UrbanVGGT achieves a mean absolute error of 0.252 m, with 95.5% of estimates within 0.50 m of the reference width. Ablation experiments show that metric scale calibration is the most critical component, and controlled comparisons with alternative geometry backbones support the effectiveness of the overall design. As a feasibility demonstration, we further apply the pipeline to three cities and generate SV-SideWidth, a prototype sidewalk-width dataset covering 527 OpenStreetMap street segments. The results indicate that street-view imagery can support scalable generation of candidate sidewalk-width attributes, while broader cross-city validation and local ground-truth auditing remain necessary before deployment as authoritative planning data.

UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images

Abstract

Paper Structure (32 sections, 4 equations, 6 figures, 7 tables)

This paper contains 32 sections, 4 equations, 6 figures, 7 tables.

Introduction
Related Work
Sidewalk Measurement from Remote Sensing
Sidewalk Measurement from Street-View Imagery
Monocular Depth and 3D Reconstruction
Methodology
Semantic Segmentation
3D Geometry Reconstruction via VGGT
Ground-Plane Fitting
Metric Scale Calibration
Column-Wise Width Estimation
Experiments
Dataset and Evaluation Protocol
Qualitative Results
Ablation Study
...and 17 more sections

Figures (6)

Figure 1: Sidewalk width data in OpenStreetMap. Grey lines denote all drivable streets; blue lines (if any) denote streets with a sidewalk-width tag. Both New York City (461 streets) and Nairobi (1958 streets) have zero sidewalk-width tags, highlighting the data gap that UrbanVGGT aims to fill.
Figure 2: UrbanVGGT pipeline overview. (a) Input street-view image. (b) Semantic segmentation with inner (yellow) and outer (red) boundary detection. (c) Midline overlap region used to pair boundary points. (d) VGGT-based 3D reconstruction. (e) Ground-plane fitting with semantic point cloud. (f) Width estimation and preliminary dataset construction.
Figure 3: Qualitative measurement examples on the D.C. dataset. Each panel shows the segmentation overlay with detected inner (yellow) and outer (red) boundaries. Predicted width: model estimate; ground-truth width: reference measurement.
Figure 4: Camera height sensitivity: MAE as a function of the assumed camera mounting height $h_{\mathrm{cam}}$.
Figure 5: MAE comparison across all methods. Models are grouped by evaluation category: Category 1 (metric depth with native scale), Category 2 (monocular depth with pinhole unprojection and scale calibration), and Category 3 (single-image point-cloud reconstruction with scale calibration). All methods share the same segmentation, boundary extraction, plane fitting, and outlier filtering; only the 3D geometry backbone differs. The dashed red line indicates the UrbanVGGT MAE (0.252 m).
...and 1 more figures

UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images

Abstract

UrbanVGGT: Scalable Sidewalk Width Estimation from Street View Images

Authors

Abstract

Table of Contents

Figures (6)