Instance-Level Generation for Representation Learning
Yankun Wu, Zakaria Laskar, Giorgos Kordopatis-Zilos, Noa Garcia, Giorgos Tolias
TL;DR
This work tackles the data bottleneck in instance-level recognition by introducing ILGen, a fully synthetic pipeline that uses an LLM to generate object categories and a generative diffusion model to create diverse object instances, backgrounds, and viewpoints. By training a foundation vision encoder with a retrieval-oriented objective (recall@k) on CKN synthetic data, the method achieves cross-domain ILR improvements across seven benchmarks and demonstrates a new paradigm where only domain names are required as input. The results show synthetic data can outperform real-labeled data in multi-domain retrieval tasks, highlighting the practicality of synthetic ILR for rapid domain adaptation and wide applicability. The approach integrates LLMs, GDMs, and advanced background relighting to produce high-variance, instance-level training sets that improve universal representation learning for ILR.
Abstract
Instance-level recognition (ILR) focuses on identifying individual objects rather than broad categories, offering the highest granularity in image classification. However, this fine-grained nature makes creating large-scale annotated datasets challenging, limiting ILR's real-world applicability across domains. To overcome this, we introduce a novel approach that synthetically generates diverse object instances from multiple domains under varied conditions and backgrounds, forming a large-scale training set. Unlike prior work on automatic data synthesis, our method is the first to address ILR-specific challenges without relying on any real images. Fine-tuning foundation vision models on the generated data significantly improves retrieval performance across seven ILR benchmarks spanning multiple domains. Our approach offers a new, efficient, and effective alternative to extensive data collection and curation, introducing a new ILR paradigm where the only input is the names of the target domains, unlocking a wide range of real-world applications.
