Table of Contents
Fetching ...

Improving Taxonomic Image-based Out-of-distribution Detection With DNA Barcodes

Mikko Impiö, Jenni Raitoharju

TL;DR

This work tackles the challenge of out-of-distribution detection in image-based taxonomic identification, especially for fine-grained taxa. It introduces a post-hoc re-ordering method that leverages DNA barcode proximity as side information to refine OOD rankings produced by standard scoring methods, and demonstrates consistent gains across multiple OOD metrics. The approach relies on DNA distances between barcode sequences to rank potential outliers and re-orders samples accordingly, with the DNA-quantile variant (q=0.4) offering robust improvements. The study uses FinBenthic 2 with a toy COI DNA dataset and discusses practical implications, limitations (single outlier), and directions for extending to multi-outlier scenarios and larger real-world datasets such as BIOSCAN-1M.

Abstract

Image-based species identification could help scaling biodiversity monitoring to a global scale. Many challenges still need to be solved in order to implement these systems in real-world applications. A reliable image-based monitoring system must detect out-of-distribution (OOD) classes it has not been presented before. This is challenging especially with fine-grained classes. Emerging environmental monitoring techniques, DNA metabarcoding and eDNA, can help by providing information on OOD classes that are present in a sample. In this paper, we study if DNA barcodes can also support in finding the outlier images based on the outlier DNA sequence's similarity to the seen classes. We propose a re-ordering approach that can be easily applied on any pre-trained models and existing OOD detection methods. We experimentally show that the proposed approach improves taxonomic OOD detection compared to all common baselines. We also show that the method works thanks to a correlation between visual similarity and DNA barcode proximity. The code and data are available at https://github.com/mikkoim/dnaimg-ood.

Improving Taxonomic Image-based Out-of-distribution Detection With DNA Barcodes

TL;DR

This work tackles the challenge of out-of-distribution detection in image-based taxonomic identification, especially for fine-grained taxa. It introduces a post-hoc re-ordering method that leverages DNA barcode proximity as side information to refine OOD rankings produced by standard scoring methods, and demonstrates consistent gains across multiple OOD metrics. The approach relies on DNA distances between barcode sequences to rank potential outliers and re-orders samples accordingly, with the DNA-quantile variant (q=0.4) offering robust improvements. The study uses FinBenthic 2 with a toy COI DNA dataset and discusses practical implications, limitations (single outlier), and directions for extending to multi-outlier scenarios and larger real-world datasets such as BIOSCAN-1M.

Abstract

Image-based species identification could help scaling biodiversity monitoring to a global scale. Many challenges still need to be solved in order to implement these systems in real-world applications. A reliable image-based monitoring system must detect out-of-distribution (OOD) classes it has not been presented before. This is challenging especially with fine-grained classes. Emerging environmental monitoring techniques, DNA metabarcoding and eDNA, can help by providing information on OOD classes that are present in a sample. In this paper, we study if DNA barcodes can also support in finding the outlier images based on the outlier DNA sequence's similarity to the seen classes. We propose a re-ordering approach that can be easily applied on any pre-trained models and existing OOD detection methods. We experimentally show that the proposed approach improves taxonomic OOD detection compared to all common baselines. We also show that the method works thanks to a correlation between visual similarity and DNA barcode proximity. The code and data are available at https://github.com/mikkoim/dnaimg-ood.

Paper Structure

This paper contains 16 sections, 1 equation, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of the proposed OOD detection approach
  • Figure 2: AUROC change of DNA quantile re-ordering compared to OOD baseline (rows) for each model (columns). Blue is better, red worse.
  • Figure 3: DNA quantile sensitivity for parameter $q$ across 39 classes for different metrics. Light area represents the standard deviation. Values were calculated with DNA distance K80 and MaxLogit OOD scoring. For AUROC and AUPRC higher is better, for FPR@95 lower is better.
  • Figure 4: Comparison of the proportion of (false) inlier predictions and DNA-based distance between classes for outlier-inlier class pairs. There is a slight correlation. The outliers where re-ordering works the best (Polycentropus irroratus) and worst (Sphaerium) are highlighted.