Image Outlier Detection Without Training using RANSAC
Chen-Han Tsai, Yu-Shao Peng
TL;DR
This paper tackles image outlier detection when training data may be contaminated by outliers. It introduces RANSAC-NN, a training-free algorithm that uses a two-stage process—Inlier Score Prediction (ISP) and Threshold Sampling (TS)—to quantify outliers directly from data distributions via sub-sampling and cosine similarity in embedding space. The method shows competitive performance against trained OD models on natural image benchmarks and demonstrates robustness to contamination, plus the ability to improve existing OD methods when used as a data-cleaning step. It also provides guidance on hyperparameters (m,s,τ,t) and shows consistent behavior across different feature extractors, with practical implications for mislabeled detection tasks. Overall, RANSAC-NN offers a practical, training-free alternative for robust image OD and as a preprocessing tool to bolster downstream models.
Abstract
Image outlier detection (OD) is an essential tool to ensure the quality of images used in computer vision tasks. Existing algorithms often involve training a model to represent the inlier distribution, and outliers are determined by some deviation measure. Although existing methods proved effective when trained on strictly inlier samples, their performance remains questionable when undesired outliers are included during training. As a result of this limitation, it is necessary to carefully examine the data when developing OD models for new domains. In this work, we present a novel image OD algorithm called RANSAC-NN that eliminates the need of data examination and model training altogether. Unlike existing approaches, RANSAC-NN can be directly applied on datasets containing outliers by sampling and comparing subsets of the data. Our algorithm maintains favorable performance compared to existing methods on a range of benchmarks. Furthermore, we show that RANSAC-NN can enhance the robustness of existing methods by incorporating our algorithm as part of the data preparation process.
