Table of Contents
Fetching ...

Classification of freshwater snails of the genus Radomaniola with multimodal triplet networks

Dennis Vetter, Muhammad Ahsan, Diana Delicado, Thomas A. Neubauer, Thomas Wilke, Gemma Roig

TL;DR

This study tackles the problem of classifying Radomaniola freshwater snails under a small, imbalanced, multi-class setting with subtle visual differences. It introduces a multimodal triplet-network framework that fuses images, shell measurements, and genetic distances, employing offline triplet mining and a dynamic margin to learn meaningful embeddings that support accurate classification. The approach achieves expert-level accuracy (mean >98.5%) and remains effective with limited data, suggesting practical utility for fieldwork and taxonomic workflows. By combining transfer learning, multimodal fusion, and similarity-based learning, the paper demonstrates a scalable tool that can accelerate species identification while preserving biological relevance and potential explainability in collaboration with domain experts.

Abstract

In this paper, we present our first proposal of a machine learning system for the classification of freshwater snails of the genus Radomaniola. We elaborate on the specific challenges encountered during system design, and how we tackled them; namely a small, very imbalanced dataset with a high number of classes and high visual similarity between classes. We then show how we employed triplet networks and the multiple input modalities of images, measurements, and genetic information to overcome these challenges and reach a performance comparable to that of a trained domain expert.

Classification of freshwater snails of the genus Radomaniola with multimodal triplet networks

TL;DR

This study tackles the problem of classifying Radomaniola freshwater snails under a small, imbalanced, multi-class setting with subtle visual differences. It introduces a multimodal triplet-network framework that fuses images, shell measurements, and genetic distances, employing offline triplet mining and a dynamic margin to learn meaningful embeddings that support accurate classification. The approach achieves expert-level accuracy (mean >98.5%) and remains effective with limited data, suggesting practical utility for fieldwork and taxonomic workflows. By combining transfer learning, multimodal fusion, and similarity-based learning, the paper demonstrates a scalable tool that can accelerate species identification while preserving biological relevance and potential explainability in collaboration with domain experts.

Abstract

In this paper, we present our first proposal of a machine learning system for the classification of freshwater snails of the genus Radomaniola. We elaborate on the specific challenges encountered during system design, and how we tackled them; namely a small, very imbalanced dataset with a high number of classes and high visual similarity between classes. We then show how we employed triplet networks and the multiple input modalities of images, measurements, and genetic information to overcome these challenges and reach a performance comparable to that of a trained domain expert.
Paper Structure (9 sections, 2 equations, 4 figures)

This paper contains 9 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: Example specimens from six different Radomaniola species. Left: R. curta, R. mostarensis, R. seminula. Right: R. jovanovskae, R. nachtigallae, R. szarowskae. To an untrained observer the species appear very similar, differing only in minute details.
  • Figure 2: Samples per class in the dataset. The distribution is very imbalanced, most samples (88) are available for R. mostarensis, least samples (5) are availabe for R. albanica
  • Figure 3: Architecture of the classification system. A pre-trained CNN extracts image features from the input images. The image features and the measurements are concatenated to create a joint representation. This joint representation is mapped to an embedding, which is then used to classify the input. Genetic information is only used during training, where it is used in the joint optimization of embeddings and classification results.
  • Figure 4: Test set classification accuracy for each of the configurations. Inclusion of additional modalities improves system performance, although the improvement from using genetic information with the dynamic margins is not statistically significant.