A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification

Yunpeng Gong; Yongjie Hou; Jiangming Shi; Kim Long Diep; Min Jiang

A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification

Yunpeng Gong, Yongjie Hou, Jiangming Shi, Kim Long Diep, Min Jiang

TL;DR

KTCAA introduces a theory-guided meta-learning framework for few-shot cross-modal sketch-to-RGB person re-identification, addressing the RGB-to-sketch transfer gap via two modules: Alignment Augmentation (AA) to reduce domain discrepancy and Knowledge Transfer Catalyst (KTC) to enhance perturbation invariance. Grounded in a generalization bound that includes a discrepancy term $\\tfrac{1}{2} d_{\\mathcal{H} \\Delta \\mathcal{H}}(\\mathcal{D}_S, \\mathcal{D}_T)$ and a perturbation term $L \\cdot \\gamma$, the framework jointly optimizes AA, KTC, and a contrastive loss $L_C$ under a meta-learning regime to transfer RGB knowledge to sketch scenarios. Empirical results on PKU-Sketch and Market-Sketch-1K show state-of-the-art performance under data scarcity, with significant gains from the two modules and strong cross-domain generalization without relying on extensive target-domain labels. Overall, KTCAA offers a principled approach to cross-modal transfer in low-data settings, delivering practical improvements for sketch-based recognition systems.

Abstract

Sketch based person re-identification aims to match hand-drawn sketches with RGB surveillance images, but remains challenging due to significant modality gaps and limited annotated data. To address this, we introduce KTCAA, a theoretically grounded framework for few-shot cross-modal generalization. Motivated by generalization theory, we identify two key factors influencing target domain risk: (1) domain discrepancy, which quantifies the alignment difficulty between source and target distributions; and (2) perturbation invariance, which evaluates the model's robustness to modality shifts. Based on these insights, we propose two components: (1) Alignment Augmentation (AA), which applies localized sketch-style transformations to simulate target distributions and facilitate progressive alignment; and (2) Knowledge Transfer Catalyst (KTC), which enhances invariance by introducing worst-case perturbations and enforcing consistency. These modules are jointly optimized under a meta-learning paradigm that transfers alignment knowledge from data-rich RGB domains to sketch-based scenarios. Experiments on multiple benchmarks demonstrate that KTCAA achieves state-of-the-art performance, particularly in data-scarce conditions.

A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification

TL;DR

Abstract

A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)