Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval

Yang Liu; Jiale Du; Xinbo Gao; Jungong Han

Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval

Yang Liu, Jiale Du, Xinbo Gao, Jungong Han

TL;DR

This paper tackles zero-shot sketch-based image retrieval by introducing RAMLN, a memory-augmented meta-learning framework that learns an adaptive margin for a relation-aware quadruplet loss. The loss combines inter-modal and intra-modal constraints with two negatives from different modalities to better separate classes and align sketches with photos, while a meta-learned margin $\mathcal{R}(x)$ stored in external memory enables strong generalization to unseen categories. An auxiliary cross-entropy objective stabilizes training, and experiments on Sketchy Extended and TU-Berlin Extended show clear improvements over state-of-the-art methods, validating both the loss design and the margin adaptation mechanism. Overall, RAMLN advances cross-modal metric learning for ZS-SBIR by enabling dynamic margin adaptation and leveraging memory to capture rare but discriminative features across seen classes, improving generalization to unseen classes in practice.

Abstract

Sketch-based image retrieval (SBIR) relies on free-hand sketches to retrieve natural photos within the same class. However, its practical application is limited by its inability to retrieve classes absent from the training set. To address this limitation, the task has evolved into Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR), where model performance is evaluated on unseen categories. Traditional SBIR primarily focuses on narrowing the domain gap between photo and sketch modalities. However, in the zero-shot setting, the model not only needs to address this cross-modal discrepancy but also requires a strong generalization capability to transfer knowledge to unseen categories. To this end, we propose a novel framework for ZS-SBIR that employs a pair-based relation-aware quadruplet loss to bridge feature gaps. By incorporating two negative samples from different modalities, the approach prevents positive features from becoming disproportionately distant from one modality while remaining close to another, thus enhancing inter-class separability. We also propose a Relation-Aware Meta-Learning Network (RAMLN) to obtain the margin, a hyper-parameter of cross-modal quadruplet loss, to improve the generalization ability of the model. RAMLN leverages external memory to store feature information, which it utilizes to assign optimal margin values. Experimental results obtained on the extended Sketchy and TU-Berlin datasets show a sharp improvement over existing state-of-the-art methods in ZS-SBIR.

Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval

TL;DR

Abstract

Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)