Concept Retrieval -- What and How?
Ori Nizan, Oren Shrout, Ayellet Tal
TL;DR
This work defines and tackles image concept retrieval, a task that seeks to retrieve images sharing high-level concepts with a query rather than merely visually similar items. It introduces a bimodal Gaussian model of the local embedding neighborhood and leverages surrogate embeddings to identify concept-specific subsets, iteratively extracting multiple concepts via a PCA-based concept subspace and updating embeddings to promote diversity. The authors propose four evaluation metrics—Relevance, Consistency, Inner-concept Diversity, and Cross-concept Diversity—along with a human study to validate concept quality, demonstrating meaningful concept discovery across diverse datasets without supervision. The approach offers a practical framework for semantically rich retrieval with potential applications in creative industries and domain-specific search, while opening avenues for user-controlled concept extraction and extension to other modalities.
Abstract
A concept may reflect either a concrete or abstract idea. Given an input image, this paper seeks to retrieve other images that share its central concepts, capturing aspects of the underlying narrative. This goes beyond conventional retrieval or clustering methods, which emphasize visual or semantic similarity. We formally define the problem, outline key requirements, and introduce appropriate evaluation metrics. We propose a novel approach grounded in two key observations: (1) While each neighbor in the embedding space typically shares at least one concept with the query, not all neighbors necessarily share the same concept with one another. (2) Modeling this neighborhood with a bimodal Gaussian distribution uncovers meaningful structure that facilitates concept identification. Qualitative, quantitative, and human evaluations confirm the effectiveness of our approach. See the package on PyPI: https://pypi.org/project/coret/
