Floralens: a Deep Learning Model for the Portuguese Native Flora
António Filgueiras, Eduardo R. B. Marques, Luís M. B. Lopes, Miguel Marques, Hugo Silva
TL;DR
Floralens tackles automatic identification of the Portuguese native flora by building a high-quality, publicly shareable dataset from FloraOn and GBIF sources and training a CNN model via Google AutoML Vision. The resulting Floralens model, deployed through the Biolens platform, achieves competitive performance with state-of-the-art Pl@ntNet models, and its accuracy improves with multiple images and geographic context. The study provides a transparent methodology, extensive evaluation across diverse test sets (including PlantCLEF and Wikipedia), and a publicly available dataset and notebooks, enabling reproducibility and further research. Practically, Floralens supports offline-capable identification for field use and contributes to citizen science workflows by offering robust, accessible plant identification for the Portuguese flora. The work also discusses limitations and future directions, such as incorporating Vision Transformers and expanding multi-image and regional data to further boost accuracy and generalization.
Abstract
Machine-learning techniques, especially deep convolutional neural networks, are pivotal for image-based identification of biological species in many Citizen Science platforms. In this paper, we describe the construction of a dataset for the Portuguese native flora based on publicly available research-grade datasets, and the derivation of a high-accuracy model from it using off-the-shelf deep convolutional neural networks. We anchored the dataset in high-quality data provided by Sociedade Portuguesa de Botânica and added further sampled data from research-grade datasets available from GBIF. We find that with a careful dataset design, off-the-shelf machine-learning cloud services such as Google's AutoML Vision produce accurate models, with results comparable to those of Pl@ntNet, a state-of-the-art citizen science platform. The best model we derived, dubbed Floralens, has been integrated into the public website of Project Biolens, where we gather models for other taxa as well. The dataset used to train the model is also publicly available on Zenodo.
