Graph-RISE: Graph-Regularized Image Semantic Embedding
Da-Cheng Juan, Chun-Ta Lu, Zhen Li, Futang Peng, Aleksei Timofeev, Yi-Ting Chen, Yaxi Gao, Tom Duerig, Andrew Tomkins, Sujith Ravi
TL;DR
Graph-RISE tackles the challenge of ultra-fine image semantics by reframing embedding learning as large-scale classification augmented with neural graph regularization. It introduces a graph-regularized training objective and a Graph-RISE graph that encodes image-image relationships via co-click and similar-image signals, trained on roughly 260M images and 40M labels. The approach yields substantial improvements over state-of-the-art models on kNN and triplet evaluations for ImageNet, iNaturalist, and related datasets, and qualitative results show closer alignment with human perception. This work demonstrates that combining massive-scale labeled data with graph-regularized neural networks can produce instance-level embeddings with practical benefits for search and ranking.
Abstract
Learning image representations to capture fine-grained semantics has been a challenging and important task enabling many applications such as image search and clustering. In this paper, we present Graph-Regularized Image Semantic Embedding (Graph-RISE), a large-scale neural graph learning framework that allows us to train embeddings to discriminate an unprecedented O(40M) ultra-fine-grained semantic labels. Graph-RISE outperforms state-of-the-art image embedding algorithms on several evaluation tasks, including image classification and triplet ranking. We provide case studies to demonstrate that, qualitatively, image retrieval based on Graph-RISE effectively captures semantics and, compared to the state-of-the-art, differentiates nuances at levels that are closer to human-perception.
