SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches
Renzhi Wu, Pramod Chunduri, Dristi J Shah, Ashmitha Julius Aravind, Ali Payani, Xu Chu, Joy Arulraj, Kexin Rong
TL;DR
SketchQL tackles zero-shot video moment querying by letting users sketch target events as object trajectories. It combines a visual query interface with a transformer-based Matcher that embeds and compares per-frame bounding-box trajectories, using $\mathrm{sim}(C_Q, C_V) = \cos(\mathbf{emb}(C_Q), \mathbf{emb}(C_V))$ to rank clips; the embeddings are learned from a trajectory simulator to generalize across datasets. The system operates on pre-extracted bounding-box trajectories and supports single- and multi-object queries via end-to-end demonstrations on real video. This work delivers a usable GUI, a generalizable trajectory similarity model, and a practical demonstration of sketch-based video moment querying across datasets.
Abstract
In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajectory similarity, SketchQL achieves zero-shot video moments retrieval by performing similarity searches over the video to identify clips that are the most similar to the visual query. In this demonstration, we introduce the graphic user interface of SketchQL and detail its functionalities and interaction mechanisms. We also demonstrate the end-to-end usage of SketchQL from query composition to video moments retrieval using real-world scenarios.
