SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches

Renzhi Wu; Pramod Chunduri; Dristi J Shah; Ashmitha Julius Aravind; Ali Payani; Xu Chu; Joy Arulraj; Kexin Rong

SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches

Renzhi Wu, Pramod Chunduri, Dristi J Shah, Ashmitha Julius Aravind, Ali Payani, Xu Chu, Joy Arulraj, Kexin Rong

TL;DR

SketchQL tackles zero-shot video moment querying by letting users sketch target events as object trajectories. It combines a visual query interface with a transformer-based Matcher that embeds and compares per-frame bounding-box trajectories, using $\mathrm{sim}(C_Q, C_V) = \cos(\mathbf{emb}(C_Q), \mathbf{emb}(C_V))$ to rank clips; the embeddings are learned from a trajectory simulator to generalize across datasets. The system operates on pre-extracted bounding-box trajectories and supports single- and multi-object queries via end-to-end demonstrations on real video. This work delivers a usable GUI, a generalizable trajectory similarity model, and a practical demonstration of sketch-based video moment querying across datasets.

Abstract

In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajectory similarity, SketchQL achieves zero-shot video moments retrieval by performing similarity searches over the video to identify clips that are the most similar to the visual query. In this demonstration, we introduce the graphic user interface of SketchQL and detail its functionalities and interaction mechanisms. We also demonstrate the end-to-end usage of SketchQL from query composition to video moments retrieval using real-world scenarios.

SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches

TL;DR

to rank clips; the embeddings are learned from a trajectory simulator to generalize across datasets. The system operates on pre-extracted bounding-box trajectories and supports single- and multi-object queries via end-to-end demonstrations on real video. This work delivers a usable GUI, a generalizable trajectory similarity model, and a practical demonstration of sketch-based video moment querying across datasets.

Abstract

Paper Structure (8 sections, 4 figures)

This paper contains 8 sections, 4 figures.

Introduction
System Overview
Sketcher: Composing visual queries
Matcher: Identifying Similar Clips
Demonstration
End-to-end Demo with Q1
Multi-object Event Query Demo with Q2
Conclusion

Figures (4)

Figure 1: Diverse left-turn behaviors in a real-world traffic surveillance video stream.
Figure 2: User interface of SketchQL.
Figure 3: SketchQL end-to-end usage demonstration
Figure 4: Trajectory panel for multi-object query Q2 after Step 4.

SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches

TL;DR

Abstract

SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches

Authors

TL;DR

Abstract

Table of Contents

Figures (4)