Table of Contents
Fetching ...

Floralens: a Deep Learning Model for the Portuguese Native Flora

António Filgueiras, Eduardo R. B. Marques, Luís M. B. Lopes, Miguel Marques, Hugo Silva

TL;DR

Floralens tackles automatic identification of the Portuguese native flora by building a high-quality, publicly shareable dataset from FloraOn and GBIF sources and training a CNN model via Google AutoML Vision. The resulting Floralens model, deployed through the Biolens platform, achieves competitive performance with state-of-the-art Pl@ntNet models, and its accuracy improves with multiple images and geographic context. The study provides a transparent methodology, extensive evaluation across diverse test sets (including PlantCLEF and Wikipedia), and a publicly available dataset and notebooks, enabling reproducibility and further research. Practically, Floralens supports offline-capable identification for field use and contributes to citizen science workflows by offering robust, accessible plant identification for the Portuguese flora. The work also discusses limitations and future directions, such as incorporating Vision Transformers and expanding multi-image and regional data to further boost accuracy and generalization.

Abstract

Machine-learning techniques, especially deep convolutional neural networks, are pivotal for image-based identification of biological species in many Citizen Science platforms. In this paper, we describe the construction of a dataset for the Portuguese native flora based on publicly available research-grade datasets, and the derivation of a high-accuracy model from it using off-the-shelf deep convolutional neural networks. We anchored the dataset in high-quality data provided by Sociedade Portuguesa de Botânica and added further sampled data from research-grade datasets available from GBIF. We find that with a careful dataset design, off-the-shelf machine-learning cloud services such as Google's AutoML Vision produce accurate models, with results comparable to those of Pl@ntNet, a state-of-the-art citizen science platform. The best model we derived, dubbed Floralens, has been integrated into the public website of Project Biolens, where we gather models for other taxa as well. The dataset used to train the model is also publicly available on Zenodo.

Floralens: a Deep Learning Model for the Portuguese Native Flora

TL;DR

Floralens tackles automatic identification of the Portuguese native flora by building a high-quality, publicly shareable dataset from FloraOn and GBIF sources and training a CNN model via Google AutoML Vision. The resulting Floralens model, deployed through the Biolens platform, achieves competitive performance with state-of-the-art Pl@ntNet models, and its accuracy improves with multiple images and geographic context. The study provides a transparent methodology, extensive evaluation across diverse test sets (including PlantCLEF and Wikipedia), and a publicly available dataset and notebooks, enabling reproducibility and further research. Practically, Floralens supports offline-capable identification for field use and contributes to citizen science workflows by offering robust, accessible plant identification for the Portuguese flora. The work also discusses limitations and future directions, such as incorporating Vision Transformers and expanding multi-image and regional data to further boost accuracy and generalization.

Abstract

Machine-learning techniques, especially deep convolutional neural networks, are pivotal for image-based identification of biological species in many Citizen Science platforms. In this paper, we describe the construction of a dataset for the Portuguese native flora based on publicly available research-grade datasets, and the derivation of a high-accuracy model from it using off-the-shelf deep convolutional neural networks. We anchored the dataset in high-quality data provided by Sociedade Portuguesa de Botânica and added further sampled data from research-grade datasets available from GBIF. We find that with a careful dataset design, off-the-shelf machine-learning cloud services such as Google's AutoML Vision produce accurate models, with results comparable to those of Pl@ntNet, a state-of-the-art citizen science platform. The best model we derived, dubbed Floralens, has been integrated into the public website of Project Biolens, where we gather models for other taxa as well. The dataset used to train the model is also publicly available on Zenodo.
Paper Structure (19 sections, 2 equations, 15 figures, 2 tables)

This paper contains 19 sections, 2 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Detail of the FloraOn web application.
  • Figure 2: Dataset histograms ($x$-axis: number of images; $y$-axis: number of species).
  • Figure 3: Model derivation using GAMLV.
  • Figure 4: GAMLV interface for model training and deployment.
  • Figure 5: Layers of the CNN model (fragment).
  • ...and 10 more figures