Table of Contents
Fetching ...

SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches

Renzhi Wu, Pramod Chunduri, Dristi J Shah, Ashmitha Julius Aravind, Ali Payani, Xu Chu, Joy Arulraj, Kexin Rong

TL;DR

SketchQL tackles zero-shot video moment querying by letting users sketch target events as object trajectories. It combines a visual query interface with a transformer-based Matcher that embeds and compares per-frame bounding-box trajectories, using $\mathrm{sim}(C_Q, C_V) = \cos(\mathbf{emb}(C_Q), \mathbf{emb}(C_V))$ to rank clips; the embeddings are learned from a trajectory simulator to generalize across datasets. The system operates on pre-extracted bounding-box trajectories and supports single- and multi-object queries via end-to-end demonstrations on real video. This work delivers a usable GUI, a generalizable trajectory similarity model, and a practical demonstration of sketch-based video moment querying across datasets.

Abstract

In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajectory similarity, SketchQL achieves zero-shot video moments retrieval by performing similarity searches over the video to identify clips that are the most similar to the visual query. In this demonstration, we introduce the graphic user interface of SketchQL and detail its functionalities and interaction mechanisms. We also demonstrate the end-to-end usage of SketchQL from query composition to video moments retrieval using real-world scenarios.

SketchQL Demonstration: Zero-shot Video Moment Querying with Sketches

TL;DR

SketchQL tackles zero-shot video moment querying by letting users sketch target events as object trajectories. It combines a visual query interface with a transformer-based Matcher that embeds and compares per-frame bounding-box trajectories, using to rank clips; the embeddings are learned from a trajectory simulator to generalize across datasets. The system operates on pre-extracted bounding-box trajectories and supports single- and multi-object queries via end-to-end demonstrations on real video. This work delivers a usable GUI, a generalizable trajectory similarity model, and a practical demonstration of sketch-based video moment querying across datasets.

Abstract

In this paper, we will present SketchQL, a video database management system (VDBMS) for retrieving video moments with a sketch-based query interface. This novel interface allows users to specify object trajectory events with simple mouse drag-and-drop operations. Users can use trajectories of single objects as building blocks to compose complex events. Using a pre-trained model that encodes trajectory similarity, SketchQL achieves zero-shot video moments retrieval by performing similarity searches over the video to identify clips that are the most similar to the visual query. In this demonstration, we introduce the graphic user interface of SketchQL and detail its functionalities and interaction mechanisms. We also demonstrate the end-to-end usage of SketchQL from query composition to video moments retrieval using real-world scenarios.
Paper Structure (8 sections, 4 figures)

This paper contains 8 sections, 4 figures.

Figures (4)

  • Figure 1: Diverse left-turn behaviors in a real-world traffic surveillance video stream.
  • Figure 2: User interface of SketchQL.
  • Figure 3: SketchQL end-to-end usage demonstration
  • Figure 4: Trajectory panel for multi-object query Q2 after Step 4.