OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
Mohamad Hassan N C, Divyam Gupta, Mainak Singha, Sai Bhargav Rongali, Ankit Jha, Muhammad Haris Khan, Biplab Banerjee
TL;DR
OSLoPrompt tackles LSOSDG by uniting low-shot domain generalization with open-set handling in CLIP. It introduces domain-agnostic prompt learning augmented by image-to-attribute cross-attention and learnable visual prompts, plus a controlled synthesis of fine-grained pseudo-open samples via GPT-4o and Stable Diffusion to train an Unknown class. The approach is validated on five benchmarks, achieving state-of-the-art HScores and showing robust gains from the ablations of domain-specific versus domain-agnostic prompts and loss terms, with significant improvements over strong baselines. The work provides a practical framework for robust open-world recognition under scarce supervision, with potential extensions to broader open-world and structured prediction tasks.
Abstract
We introduce Low-Shot Open-Set Domain Generalization (LSOSDG), a novel paradigm unifying low-shot learning with open-set domain generalization (ODG). While prompt-based methods using models like CLIP have advanced DG, they falter in low-data regimes (e.g., 1-shot) and lack precision in detecting open-set samples with fine-grained semantics related to training classes. To address these challenges, we propose OSLOPROMPT, an advanced prompt-learning framework for CLIP with two core innovations. First, to manage limited supervision across source domains and improve DG, we introduce a domain-agnostic prompt-learning mechanism that integrates adaptable domain-specific cues and visually guided semantic attributes through a novel cross-attention module, besides being supported by learnable domain- and class-generic visual prompts to enhance cross-modal adaptability. Second, to improve outlier rejection during inference, we classify unfamiliar samples as "unknown" and train specialized prompts with systematically synthesized pseudo-open samples that maintain fine-grained relationships to known classes, generated through a targeted query strategy with off-the-shelf foundation models. This strategy enhances feature learning, enabling our model to detect open samples with varied granularity more effectively. Extensive evaluations across five benchmarks demonstrate that OSLOPROMPT establishes a new state-of-the-art in LSOSDG, significantly outperforming existing methods.
