Improving Taxonomic Image-based Out-of-distribution Detection With DNA Barcodes
Mikko Impiö, Jenni Raitoharju
TL;DR
This work tackles the challenge of out-of-distribution detection in image-based taxonomic identification, especially for fine-grained taxa. It introduces a post-hoc re-ordering method that leverages DNA barcode proximity as side information to refine OOD rankings produced by standard scoring methods, and demonstrates consistent gains across multiple OOD metrics. The approach relies on DNA distances between barcode sequences to rank potential outliers and re-orders samples accordingly, with the DNA-quantile variant (q=0.4) offering robust improvements. The study uses FinBenthic 2 with a toy COI DNA dataset and discusses practical implications, limitations (single outlier), and directions for extending to multi-outlier scenarios and larger real-world datasets such as BIOSCAN-1M.
Abstract
Image-based species identification could help scaling biodiversity monitoring to a global scale. Many challenges still need to be solved in order to implement these systems in real-world applications. A reliable image-based monitoring system must detect out-of-distribution (OOD) classes it has not been presented before. This is challenging especially with fine-grained classes. Emerging environmental monitoring techniques, DNA metabarcoding and eDNA, can help by providing information on OOD classes that are present in a sample. In this paper, we study if DNA barcodes can also support in finding the outlier images based on the outlier DNA sequence's similarity to the seen classes. We propose a re-ordering approach that can be easily applied on any pre-trained models and existing OOD detection methods. We experimentally show that the proposed approach improves taxonomic OOD detection compared to all common baselines. We also show that the method works thanks to a correlation between visual similarity and DNA barcode proximity. The code and data are available at https://github.com/mikkoim/dnaimg-ood.
