Computer vision-based estimation of invertebrate biomass

Mikko Impiö; Philipp M. Rehsen; Jarrett Blair; Cecilie Mielec; Arne J. Beermann; Florian Leese; Toke T. Høye; Jenni Raitoharju

Computer vision-based estimation of invertebrate biomass

Mikko Impiö, Philipp M. Rehsen, Jarrett Blair, Cecilie Mielec, Arne J. Beermann, Florian Leese, Toke T. Høye, Jenni Raitoharju

TL;DR

This work presents two approaches for dry mass estimation that do not require additional manual effort apart from imaging the specimens: fitting a linear model with novel predictors, automatically calculated by an imaging device, and training a family of end-to-end deep neural networks for the task, using single-view, multi-view, and metadata-aware architectures.

Abstract

The ability to estimate invertebrate biomass using only images could help scaling up quantitative biodiversity monitoring efforts. Computer vision-based methods have the potential to omit the manual, time-consuming, and destructive process of dry weighing specimens. We present two approaches for dry mass estimation that do not require additional manual effort apart from imaging the specimens: fitting a linear model with novel predictors, automatically calculated by an imaging device, and training a family of end-to-end deep neural networks for the task, using single-view, multi-view, and metadata-aware architectures. We propose using area and sinking speed as predictors. These can be calculated with BIODISCOVER, which is a dual-camera system that captures image sequences of specimens sinking in an ethanol column. For this study, we collected a large dataset of dry mass measurement and image sequence pairs to train and evaluate models. We show that our methods can estimate specimen dry mass even with complex and visually diverse specimen morphologies. Combined with automatic taxonomic classification, our approach is an accurate method for group-level dry mass estimation, with a median percentage error of 10-20% for individuals. We highlight the importance of choosing appropriate evaluation metrics, and encourage using both percentage errors and absolute errors as metrics, because they measure different properties. We also explore different optimization losses, data augmentation methods, and model architectures for training deep-learning models.

Computer vision-based estimation of invertebrate biomass

TL;DR

Abstract

Paper Structure (28 sections, 6 equations, 9 figures, 6 tables)

This paper contains 28 sections, 6 equations, 9 figures, 6 tables.

Introduction
Materials and methods
Datasets
Evaluation metrics
Biomass prediction
Convolutional neural networks
Linear models
Experimental setup
CNN model optimization
Biomass estimation
Out-of-domain biomass estimation
Classification and end-to-end estimation
Training setup
Results
CNN model optimization
...and 13 more sections

Figures (9)

Figure 1: Randomly sampled images from both datasets grouped by taxonomic order. There is large variation in size and biomass within groups. S=Species dataset, O=Order dataset. The Species dataset was imaged completely separately, using a different device from the Order dataset, thus providing a test dataset for out-of-distribution generalization.
Figure 2: Overview of the BIODISCOVER imaging device setup. The device images specimens from two angles as they fall through an ethanol-filled cuvette. Useful metadata, such as sinking speed, can be calculated from the image timestamps, locations, and total number of images, as the camera frame rate is fixed.
Figure 3: Histogram of dry mass measurements (a), and relationships between mass measurements and dataset features (b-d).
Figure 4: Area and dry mass ranges of Asellus aquaticus specimens. The x-axis shows samples from five mass quantiles (0-20%, 20%-40%, 40%-60%, 60%-80%, and 80%-100%) and the y-axis shows minimum, mean and maximum area from these quantiles. Below each image, the specimen mass (m) and area (A) of the specimen in this particular image is given.
Figure 5: Overview of the different CNN models used in this paper. Image inputs $x$ are passed to CNN encoders ($g$), while possible metadata $v$ is passed to a fully connected metadata encoder $\mu$. Encoders output intermediate feature vectors $z$. For the multi-view and metadata-aware architectures, feature vectors are concatenated before passing the features to a fully connected projection head ($h$) with either one or two layers. The output of this network is the final biomass estimate $\hat{y}$.
...and 4 more figures

Computer vision-based estimation of invertebrate biomass

TL;DR

Abstract

Computer vision-based estimation of invertebrate biomass

Authors

TL;DR

Abstract

Table of Contents

Figures (9)