Table of Contents
Fetching ...

Concept Retrieval -- What and How?

Ori Nizan, Oren Shrout, Ayellet Tal

TL;DR

This work defines and tackles image concept retrieval, a task that seeks to retrieve images sharing high-level concepts with a query rather than merely visually similar items. It introduces a bimodal Gaussian model of the local embedding neighborhood and leverages surrogate embeddings to identify concept-specific subsets, iteratively extracting multiple concepts via a PCA-based concept subspace and updating embeddings to promote diversity. The authors propose four evaluation metrics—Relevance, Consistency, Inner-concept Diversity, and Cross-concept Diversity—along with a human study to validate concept quality, demonstrating meaningful concept discovery across diverse datasets without supervision. The approach offers a practical framework for semantically rich retrieval with potential applications in creative industries and domain-specific search, while opening avenues for user-controlled concept extraction and extension to other modalities.

Abstract

A concept may reflect either a concrete or abstract idea. Given an input image, this paper seeks to retrieve other images that share its central concepts, capturing aspects of the underlying narrative. This goes beyond conventional retrieval or clustering methods, which emphasize visual or semantic similarity. We formally define the problem, outline key requirements, and introduce appropriate evaluation metrics. We propose a novel approach grounded in two key observations: (1) While each neighbor in the embedding space typically shares at least one concept with the query, not all neighbors necessarily share the same concept with one another. (2) Modeling this neighborhood with a bimodal Gaussian distribution uncovers meaningful structure that facilitates concept identification. Qualitative, quantitative, and human evaluations confirm the effectiveness of our approach. See the package on PyPI: https://pypi.org/project/coret/

Concept Retrieval -- What and How?

TL;DR

This work defines and tackles image concept retrieval, a task that seeks to retrieve images sharing high-level concepts with a query rather than merely visually similar items. It introduces a bimodal Gaussian model of the local embedding neighborhood and leverages surrogate embeddings to identify concept-specific subsets, iteratively extracting multiple concepts via a PCA-based concept subspace and updating embeddings to promote diversity. The authors propose four evaluation metrics—Relevance, Consistency, Inner-concept Diversity, and Cross-concept Diversity—along with a human study to validate concept quality, demonstrating meaningful concept discovery across diverse datasets without supervision. The approach offers a practical framework for semantically rich retrieval with potential applications in creative industries and domain-specific search, while opening avenues for user-controlled concept extraction and extension to other modalities.

Abstract

A concept may reflect either a concrete or abstract idea. Given an input image, this paper seeks to retrieve other images that share its central concepts, capturing aspects of the underlying narrative. This goes beyond conventional retrieval or clustering methods, which emphasize visual or semantic similarity. We formally define the problem, outline key requirements, and introduce appropriate evaluation metrics. We propose a novel approach grounded in two key observations: (1) While each neighbor in the embedding space typically shares at least one concept with the query, not all neighbors necessarily share the same concept with one another. (2) Modeling this neighborhood with a bimodal Gaussian distribution uncovers meaningful structure that facilitates concept identification. Qualitative, quantitative, and human evaluations confirm the effectiveness of our approach. See the package on PyPI: https://pypi.org/project/coret/

Paper Structure

This paper contains 8 sections, 15 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Similarity score distribution. This image shows a bimodal Gaussian of similarity scores between a surrogate $\mathbf{s}$ and the input's neighbors from Fig. \ref{['fig:Teaser']}. The smaller (right) mode defines the 'concept' set; the larger (left) defines the 'non-concept' set.
  • Figure 2: PCA navigations. Each column represents a direction within the PCA subspace. They display variations in (1) jump height, (2) breed, and (3) distance from the camera. However, they all share the underlying concept of 'a dog jumping for a frisbee.'
  • Figure 3: Qualitative results. Each subfigure shows an input image with three columns, each depicting a distinct extracted concept. All concepts are relevant, consistent, diverse and quite creative. See supplementary material for additional examples.
  • Figure 4: Domain‐specific concepts.