Table of Contents
Fetching ...

SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam

Tue Vo, Lakshay Sharma, Tuan Dinh, Khuong Dinh, Trang Nguyen, Trung Phan, Minh Do, Duong Vu

TL;DR

Vietnam lacks comprehensive aquatic invertebrate data, limiting biodiversity monitoring and conservation. SuoiAI presents an end-to-end pipeline that combines underwater image collection, targeted annotation strategies, and semi-supervised learning to enable scalable genus- and species-level classification. Key contributions include a field-ready data acquisition plan, bootstrapped labeling via teacher–student cycles, and a hybrid modeling approach that blends on-device detection with cloud analysis and fine-grained taxonomy. The work aims to deliver practical biodiversity tools for Vietnam and other tropical regions, potentially enabling a foundational dataset and model ecosystem for aquatic invertebrates.

Abstract

Understanding and monitoring aquatic biodiversity is critical for ecological health and conservation efforts. This paper proposes SuoiAI, an end-to-end pipeline for building a dataset of aquatic invertebrates in Vietnam and employing machine learning (ML) techniques for species classification. We outline the methods for data collection, annotation, and model training, focusing on reducing annotation effort through semi-supervised learning and leveraging state-of-the-art object detection and classification models. Our approach aims to overcome challenges such as data scarcity, fine-grained classification, and deployment in diverse environmental conditions.

SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam

TL;DR

Vietnam lacks comprehensive aquatic invertebrate data, limiting biodiversity monitoring and conservation. SuoiAI presents an end-to-end pipeline that combines underwater image collection, targeted annotation strategies, and semi-supervised learning to enable scalable genus- and species-level classification. Key contributions include a field-ready data acquisition plan, bootstrapped labeling via teacher–student cycles, and a hybrid modeling approach that blends on-device detection with cloud analysis and fine-grained taxonomy. The work aims to deliver practical biodiversity tools for Vietnam and other tropical regions, potentially enabling a foundational dataset and model ecosystem for aquatic invertebrates.

Abstract

Understanding and monitoring aquatic biodiversity is critical for ecological health and conservation efforts. This paper proposes SuoiAI, an end-to-end pipeline for building a dataset of aquatic invertebrates in Vietnam and employing machine learning (ML) techniques for species classification. We outline the methods for data collection, annotation, and model training, focusing on reducing annotation effort through semi-supervised learning and leveraging state-of-the-art object detection and classification models. Our approach aims to overcome challenges such as data scarcity, fine-grained classification, and deployment in diverse environmental conditions.

Paper Structure

This paper contains 13 sections, 1 figure.

Figures (1)

  • Figure 1: An end-to-end diagram of our pipeline