ADS: Approximate Densest Subgraph for Novel Image Discovery
Shanfeng Hu
TL;DR
The paper tackles the problem of discovering visually novel images in large repositories without requiring training data. It models a collection as a perceptual distance-weighted complete graph and relaxes the NP-hard $K$-densest subgraph problem into a sparse continuous objective $\mathbf{s}^T D \mathbf{s}$ solved by SGD with Monte Carlo gradient estimates and sparsity clipping, avoiding full distance matrix storage. The approach, ADS, is training-free and scalable, demonstrated to be faster and more memory-efficient than state-of-the-art methods while accurately identifying novel images on synthetic data and the Tiny-ImageNet dataset. This yields a practical on-device capability for real-time novelty mining in large image collections, with potential for broad adoption in content management and retrieval tasks.
Abstract
The volume of image repositories continues to grow. Despite the availability of content-based addressing, we still lack a lightweight tool that allows us to discover images of distinct characteristics from a large collection. In this paper, we propose a fast and training-free algorithm for novel image discovery. The key of our algorithm is formulating a collection of images as a perceptual distance-weighted graph, within which our task is to locate the K-densest subgraph that corresponds to a subset of the most unique images. While solving this problem is not just NP-hard but also requires a full computation of the potentially huge distance matrix, we propose to relax it into a K-sparse eigenvector problem that we can efficiently solve using stochastic gradient descent (SGD) without explicitly computing the distance matrix. We compare our algorithm against state-of-the-arts on both synthetic and real datasets, showing that it is considerably faster to run with a smaller memory footprint while able to mine novel images more accurately.
