Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

Aneeshan Sain; Pinaki Nath Chowdhury; Subhadeep Koley; Ayan Kumar Bhunia; Yi-Zhe Song

Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

Aneeshan Sain, Pinaki Nath Chowdhury, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song

TL;DR

This work tackles FG-SBIR by addressing view-awareness: a sketch query can originate from a different viewpoint than the gallery target, which degrades standard single-view FG-SBIR. The authors propose a view-aware framework that (i) uses sketch-independent multi-view 2D projections of 3D objects to inject view semantics and (ii) develops a disentangled cross-modal encoder producing content $f_c$ and view $f_v$ features to support view-agnostic ($f_c$) and view-specific ($f_c+f_v$) retrieval with a unified model, guided by a final objective $L_{trn}$. Key contributions include the multi-view projection strategy, a cross-modal disentanglement approach enabling a simple view-switch, and comprehensive experiments on chairs and lamps showing improvements over baselines. The results demonstrate flexible, user-controllable retrieval in FG-SBIR and illustrate how 2D projections can sensitize cross-modal sketches to 3D view variation without full 3D representations, with potential impact on practical sketch-based search systems.

Abstract

In this paper, we delve into the intricate dynamics of Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) by addressing a critical yet overlooked aspect -- the choice of viewpoint during sketch creation. Unlike photo systems that seamlessly handle diverse views through extensive datasets, sketch systems, with limited data collected from fixed perspectives, face challenges. Our pilot study, employing a pre-trained FG-SBIR model, highlights the system's struggle when query-sketches differ in viewpoint from target instances. Interestingly, a questionnaire however shows users desire autonomy, with a significant percentage favouring view-specific retrieval. To reconcile this, we advocate for a view-aware system, seamlessly accommodating both view-agnostic and view-specific tasks. Overcoming dataset limitations, our first contribution leverages multi-view 2D projections of 3D objects, instilling cross-modal view awareness. The second contribution introduces a customisable cross-modal feature through disentanglement, allowing effortless mode switching. Extensive experiments on standard datasets validate the effectiveness of our method.

Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

TL;DR

and view

features to support view-agnostic (

) and view-specific (

) retrieval with a unified model, guided by a final objective

. Key contributions include the multi-view projection strategy, a cross-modal disentanglement approach enabling a simple view-switch, and comprehensive experiments on chairs and lamps showing improvements over baselines. The results demonstrate flexible, user-controllable retrieval in FG-SBIR and illustrate how 2D projections can sensitize cross-modal sketches to 3D view variation without full 3D representations, with potential impact on practical sketch-based search systems.

Abstract

Paper Structure (13 sections, 7 equations, 6 figures, 2 tables)

This paper contains 13 sections, 7 equations, 6 figures, 2 tables.

Introduction
Related Works
Problem and Analysis
Background on FG-SBIR
Problem Definition
Proposed Methodology
Learning Objectives
Experiments
Competitors
Performance Analysis
Ablative Study
Limitations and Future Works
Conclusion

Figures (6)

Figure 1: Our framework. We aim to handle both view-agnostic and specific retrieval using one model.
Figure 2: Our model disentangles an input into its view and content semantics. Sketch-photo pairs from FG-SBIR datasets are used to learn cross-modal discriminative knowledge, whereas multi-view 2D projections from unlabelled 3D models helps condition the encoder with view-aware knowledge. Once trained, the content and view features are used for view-agnostic and view-specific retrieval as shown.
Figure 3: Varying training data-size ($\mathcal{D}_\text{CM}$).
Figure 4: Qualitative Results of View-Agnostic FG-SBIR
Figure 5: Qualitative Results of View-Specific FG-SBIR
...and 1 more figures

Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

TL;DR

Abstract

Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

Authors

TL;DR

Abstract

Table of Contents

Figures (6)