Table of Contents
Fetching ...

Evaluating Keyframe Layouts for Visual Known-Item Search in Homogeneous Collections

Bastian Jäckl, Jiří Kruchina, Lucas Joos, Daniel A. Keim, Ladislav Peška, Jakub Lokoč

TL;DR

This work investigates how keyframe grid layouts influence browsing efficiency and accuracy in Visual Known-Item Search on a large-scale, homogeneous MVK dataset. It compares seven layouts (four ranked, two sorted, one grouped) using a within-subject design with $|C_r|=200$ candidates across $|P|=49$ participants and $1715$ tasks, analyzing efficiency, accuracy, and browsing behaviors like region skipping and overlooks. The study finds that a video-grouped layout (V8) is fastest overall, while a four-column rank-preserving grid (G4 lp) provides the best accuracy; sorted and grouped layouts enable efficient exclusion of large regions but incur higher first-arrival times and overlooks. The findings motivate hybrid designs that preserve top-ranked item positions while sorting or grouping the remainder, with broader implications for grid-based search interfaces beyond video retrieval.

Abstract

Multimodal deep-learning models power interactive video retrieval by ranking keyframes in response to textual queries. Despite these advances, users must still browse ranked candidates manually to locate a target. Keyframe arrangement within the search grid highly affects browsing effectiveness and user efficiency, yet remains underexplored. We report a study with 49 participants evaluating seven keyframe layouts for the Visual Known-Item Search task. Beyond efficiency and accuracy, we relate browsing phenomena, such as overlooks, to layout characteristics. Our results show that a video-grouped layout is the most efficient, while a four-column, rank-preserving grid achieves the highest accuracy. Sorted grids reveal potentials and trade-offs, enabling rapid scanning of uninteresting regions but down-ranking relevant targets to less prominent positions, delaying first arrival times and increasing overlooks. These findings motivate hybrid designs that preserve positions of top-ranked items while sorting or grouping the remainder, and offer guidance for searching in grids beyond video retrieval.

Evaluating Keyframe Layouts for Visual Known-Item Search in Homogeneous Collections

TL;DR

This work investigates how keyframe grid layouts influence browsing efficiency and accuracy in Visual Known-Item Search on a large-scale, homogeneous MVK dataset. It compares seven layouts (four ranked, two sorted, one grouped) using a within-subject design with candidates across participants and tasks, analyzing efficiency, accuracy, and browsing behaviors like region skipping and overlooks. The study finds that a video-grouped layout (V8) is fastest overall, while a four-column rank-preserving grid (G4 lp) provides the best accuracy; sorted and grouped layouts enable efficient exclusion of large regions but incur higher first-arrival times and overlooks. The findings motivate hybrid designs that preserve top-ranked item positions while sorting or grouping the remainder, with broader implications for grid-based search interfaces beyond video retrieval.

Abstract

Multimodal deep-learning models power interactive video retrieval by ranking keyframes in response to textual queries. Despite these advances, users must still browse ranked candidates manually to locate a target. Keyframe arrangement within the search grid highly affects browsing effectiveness and user efficiency, yet remains underexplored. We report a study with 49 participants evaluating seven keyframe layouts for the Visual Known-Item Search task. Beyond efficiency and accuracy, we relate browsing phenomena, such as overlooks, to layout characteristics. Our results show that a video-grouped layout is the most efficient, while a four-column, rank-preserving grid achieves the highest accuracy. Sorted grids reveal potentials and trade-offs, enabling rapid scanning of uninteresting regions but down-ranking relevant targets to less prominent positions, delaying first arrival times and increasing overlooks. These findings motivate hybrid designs that preserve positions of top-ranked items while sorting or grouping the remainder, and offer guidance for searching in grids beyond video retrieval.

Paper Structure

This paper contains 34 sections, 16 figures, 8 tables.

Figures (16)

  • Figure 1: Motivation and study overview. Left: In real use, people try to find a specific video from memory and rely on iterative, visually driven browsing from partial visual cues. Right: We model this need as a controlled known-item search task: participants receive a target prompt and browse a large collection of keyframes ($|C_r|=200$) rendered with seven keyframe layouts, so we can isolate how layout design supports fast and precise re-identification during browsing.
  • Figure 2: Example of G4. Keyframes are gradually displayed top-to-bottom, left-to-right in four columns.
  • Figure 3: Example of G8 using eight columns. Compared to G4, fewer rows are needed, but keyframes are smaller.
  • Figure 4: Example of C8. The most relevant keyframes are displayed in the four middle columns, where eye-gaze attention is highest JoosJKFPL24ExaminationBehaviorEyeTrackingImages.
  • Figure 5: Example of G4 lp. Due to a side panel, keyframes are displayed smaller.
  • ...and 11 more figures