Spatialyze: A Geospatial Video Analytics System with Spatial-Aware Optimizations
Chanwut Kittivorawong, Yongming Ge, Yousef Helal, Alvin Cheung
TL;DR
Spatialyze tackles the lack of end-to-end geospatial video analytics by introducing S-Flow, a Python-embedded DSL that expresses geospatial workflows in a build-filter-observe paradigm. The system defers heavy ML inferences until observation, and employs four geo-aware optimizations—Road Visibility Pruner, Object Type Pruner, Geometry-Based 3D Location Estimator, and Exit Frame Sampler—tied to a streaming execution model (Data Integrator, Video Processor, Movable Objects Query Engine, Output Composer). Evaluations on real datasets (nuScenes, VIVA, SkyQuery) show substantial speedups (up to $7.3\times$ in some cases) with high accuracy (up to $97.1\%$) and favorable ablation results across the optimizations. The work demonstrates that leveraging geospatial metadata and inherited object behaviors can significantly accelerate end-to-end geospatial video analytics, enabling scalable analysis of large video corpora for journalism, surveillance, and AV applications.
Abstract
Videos that are shot using commodity hardware such as phones and surveillance cameras record various metadata such as time and location. We encounter such geospatial videos on a daily basis and such videos have been growing in volume significantly. Yet, we do not have data management systems that allow users to interact with such data effectively. In this paper, we describe Spatialyze, a new framework for end-to-end querying of geospatial videos. Spatialyze comes with a domain-specific language where users can construct geospatial video analytic workflows using a 3-step, declarative, build-filter-observe paradigm. Internally, Spatialyze leverages the declarative nature of such workflows, the temporal-spatial metadata stored with videos, and physical behavior of real-world objects to optimize the execution of workflows. Our results using real-world videos and workflows show that Spatialyze can reduce execution time by up to 5.3x, while maintaining up to 97.1% accuracy compared to unoptimized execution.
