Table of Contents
Fetching ...

Multi-label classification for multi-temporal, multi-spatial coral reef condition monitoring using vision foundation model with adapter learning

Xinlei Shao, Hongruixuan Chen, Fan Zhao, Kirsty Magson, Jundong Chen, Peiran Li, Jiaqi Wang, Jun Sasaki

TL;DR

This study tackles the challenge of multi-label coral reef condition classification under multi-temporal and multi-spatial settings by adapting the vision foundation model DINOv2 with Low-Rank Adaptation (LoRA). Using a Koh Tao dataset of 1,203 underwater images organized into eight labels, the authors show that DINOv2-LoRA achieves higher match ratios and F1 scores than conventional baselines while reducing trainable parameters from 1100M to 5.91M. The approach demonstrates strong generalization across seasons and sites, indicating robust transferability in ecological monitoring tasks. The work provides a practical, open-source tool for efficient coral reef monitoring and decision-making in citizen science and conservation contexts, with future directions including continual learning and user-friendly interfaces.

Abstract

Coral reef ecosystems provide essential ecosystem services, but face significant threats from climate change and human activities. Although advances in deep learning have enabled automatic classification of coral reef conditions, conventional deep models struggle to achieve high performance when processing complex underwater ecological images. Vision foundation models, known for their high accuracy and cross-domain generalizability, offer promising solutions. However, fine-tuning these models requires substantial computational resources and results in high carbon emissions. To address these challenges, adapter learning methods such as Low-Rank Adaptation (LoRA) have emerged as a solution. This study introduces an approach integrating the DINOv2 vision foundation model with the LoRA fine-tuning method. The approach leverages multi-temporal field images collected through underwater surveys at 15 dive sites at Koh Tao, Thailand, with all images labeled according to universal standards used in citizen science-based conservation programs. The experimental results demonstrate that the DINOv2-LoRA model achieved superior accuracy, with a match ratio of 64.77%, compared to 60.34% achieved by the best conventional model. Furthermore, incorporating LoRA reduced the trainable parameters from 1,100M to 5.91M. Transfer learning experiments conducted under different temporal and spatial settings highlight the exceptional generalizability of DINOv2-LoRA across different seasons and sites. This study is the first to explore the efficient adaptation of foundation models for multi-label classification of coral reef conditions under multi-temporal and multi-spatial settings. The proposed method advances the classification of coral reef conditions and provides a tool for monitoring, conserving, and managing coral reef ecosystems.

Multi-label classification for multi-temporal, multi-spatial coral reef condition monitoring using vision foundation model with adapter learning

TL;DR

This study tackles the challenge of multi-label coral reef condition classification under multi-temporal and multi-spatial settings by adapting the vision foundation model DINOv2 with Low-Rank Adaptation (LoRA). Using a Koh Tao dataset of 1,203 underwater images organized into eight labels, the authors show that DINOv2-LoRA achieves higher match ratios and F1 scores than conventional baselines while reducing trainable parameters from 1100M to 5.91M. The approach demonstrates strong generalization across seasons and sites, indicating robust transferability in ecological monitoring tasks. The work provides a practical, open-source tool for efficient coral reef monitoring and decision-making in citizen science and conservation contexts, with future directions including continual learning and user-friendly interfaces.

Abstract

Coral reef ecosystems provide essential ecosystem services, but face significant threats from climate change and human activities. Although advances in deep learning have enabled automatic classification of coral reef conditions, conventional deep models struggle to achieve high performance when processing complex underwater ecological images. Vision foundation models, known for their high accuracy and cross-domain generalizability, offer promising solutions. However, fine-tuning these models requires substantial computational resources and results in high carbon emissions. To address these challenges, adapter learning methods such as Low-Rank Adaptation (LoRA) have emerged as a solution. This study introduces an approach integrating the DINOv2 vision foundation model with the LoRA fine-tuning method. The approach leverages multi-temporal field images collected through underwater surveys at 15 dive sites at Koh Tao, Thailand, with all images labeled according to universal standards used in citizen science-based conservation programs. The experimental results demonstrate that the DINOv2-LoRA model achieved superior accuracy, with a match ratio of 64.77%, compared to 60.34% achieved by the best conventional model. Furthermore, incorporating LoRA reduced the trainable parameters from 1,100M to 5.91M. Transfer learning experiments conducted under different temporal and spatial settings highlight the exceptional generalizability of DINOv2-LoRA across different seasons and sites. This study is the first to explore the efficient adaptation of foundation models for multi-label classification of coral reef conditions under multi-temporal and multi-spatial settings. The proposed method advances the classification of coral reef conditions and provides a tool for monitoring, conserving, and managing coral reef ecosystems.

Paper Structure

This paper contains 26 sections, 10 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Flowchart illustrating the process of establishing the proposed method in this study.
  • Figure 2: (a) Location of the study area, Koh Tao, Thailand; (b) Surveyed dive sites at Koh Tao (base map sourced from PlanetLabs2024); (c) Typical stressed coral reef images taken through underwater photo transect method.
  • Figure 3: Sample image patches from the dataset, each column displaying two examples per class. (a) Healthy Coral (HLC), (b) Compromised Coral (CPC), (c) Dead Coral (DDC), (d) Rubble (RBL), (e) Competition (CPT), (f) Disease (DSE), (g) Predation (PRD), and (h) Physical Issues (PHY). As this is a multi-label classification task, each image can contain more than one class; for example, the class "compromised coral" often coexists with the class "disease."
  • Figure 4: Framework illustrating the process of developing the foundation model for the multi-label classification task. During the fine-tuning process, the pre-trained weights of ViT are frozen, while only LoRA layers and classifier are trained. With the adaptation of LoRA, we can realize the goal of exploiting the general knowledge already available in the foundation model, while making it adaptable to our task at a very low training cost.
  • Figure 5: Grad-CAM heatmap visualizations for classification using ResNet-101, Swin-Transformer-Base, and DINOv2-LoRA on eight classes. The classes are: (a) Healthy Coral (HLC), (b) Compromised Coral (CPC), (c) Dead Coral (DDC), (d) Rubble (RBL), (e) Competition (CPT), (f) Disease (DSE), (g) Predation (PRD), and (h) Physical Issues (PHY). Targeted features representing the key areas the model should focus on are circled in red.
  • ...and 6 more figures