Table of Contents
Fetching ...

FungiTastic: A multi-modal dataset and benchmark for image categorization

Lukas Picek, Klara Janouskova, Vojtech Cermak, Jiri Matas

TL;DR

FungiTastic introduces a large-scale, multi-modal benchmark for fine-grained fungal species, aggregating about 350k observations across 6k species with photos, satellite imagery, meteorological time-series, segmentation masks, and textual captions. The dataset enables realistic evaluation under domain shift, open-set, few-shot, and multi-modal settings, including a DNA-sequenced ground-truth test subset and time-based splits. A comprehensive suite of baselines across closed-set, open-set, few-shot, segmentation, and vision-language fusion demonstrates the dataset’s challenging nature and highlights the value of integrating metadata and language signals with visual data. The work provides ready-to-use baselines, pre-trained models, and training frameworks, underlining FungiTastic’s potential to advance robust, multimodal fungal identification and ecological modeling.

Abstract

We introduce a new, challenging benchmark and a dataset, FungiTastic, based on fungal records continuously collected over a twenty-year span. The dataset is labelled and curated by experts and consists of about 350k multimodal observations of 6k fine-grained categories (species). The fungi observations include photographs and additional data, e.g., meteorological and climatic data, satellite images, and body part segmentation masks. FungiTastic is one of the few benchmarks that include a test set with DNA-sequenced ground truth of unprecedented label reliability. The benchmark is designed to support (i) standard closed-set classification, (ii) open-set classification, (iii) multi-modal classification, (iv) few-shot learning, (v) domain shift, and many more. We provide tailored baselines for many use cases, a multitude of ready-to-use pre-trained models on https://huggingface.co/collections/BVRA/fungitastic-66a227ce0520be533dc6403b, and a framework for model training. The documentation and the baselines are available at https://github.com/BohemianVRA/FungiTastic/ and https://www.kaggle.com/datasets/picekl/fungitastic.

FungiTastic: A multi-modal dataset and benchmark for image categorization

TL;DR

FungiTastic introduces a large-scale, multi-modal benchmark for fine-grained fungal species, aggregating about 350k observations across 6k species with photos, satellite imagery, meteorological time-series, segmentation masks, and textual captions. The dataset enables realistic evaluation under domain shift, open-set, few-shot, and multi-modal settings, including a DNA-sequenced ground-truth test subset and time-based splits. A comprehensive suite of baselines across closed-set, open-set, few-shot, segmentation, and vision-language fusion demonstrates the dataset’s challenging nature and highlights the value of integrating metadata and language signals with visual data. The work provides ready-to-use baselines, pre-trained models, and training frameworks, underlining FungiTastic’s potential to advance robust, multimodal fungal identification and ecological modeling.

Abstract

We introduce a new, challenging benchmark and a dataset, FungiTastic, based on fungal records continuously collected over a twenty-year span. The dataset is labelled and curated by experts and consists of about 350k multimodal observations of 6k fine-grained categories (species). The fungi observations include photographs and additional data, e.g., meteorological and climatic data, satellite images, and body part segmentation masks. FungiTastic is one of the few benchmarks that include a test set with DNA-sequenced ground truth of unprecedented label reliability. The benchmark is designed to support (i) standard closed-set classification, (ii) open-set classification, (iii) multi-modal classification, (iv) few-shot learning, (v) domain shift, and many more. We provide tailored baselines for many use cases, a multitude of ready-to-use pre-trained models on https://huggingface.co/collections/BVRA/fungitastic-66a227ce0520be533dc6403b, and a framework for model training. The documentation and the baselines are available at https://github.com/BohemianVRA/FungiTastic/ and https://www.kaggle.com/datasets/picekl/fungitastic.
Paper Structure (23 sections, 10 equations, 10 figures, 9 tables)

This paper contains 23 sections, 10 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: A fungi observation includes one or more photos of an observed specimen with expert-verified taxon labels (some DNA sequenced) and occasionally also a microscopic image of its spores. Textual captions, observation metadata, geospatial data, and climatic time-series data are available for virtually all observations. For a subset ($\sim$70k photos), we provide body part segmentation masks.
  • Figure 2: FungiTastic body part segmentation. We consider five different categories, e.g., the cap, gills, stem, pores, and the ring.
  • Figure 3: Satellite RGB images with 64$\times$64 resolution extracted from Sentinel-2A rasters available at https://stac.ecodatacube.eu/.
  • Figure 4: Image caption sample. For each photograph, we use a Malmo-7B deitke2024molmo VLM to produce a realistic image caption with an exhaustive text description.
  • Figure 5: Class distribution shift on the fungim dataset.The long-term data acquisition captures a phenomenon related to natural changes in species presence, i.e., class prior shift. Sorted in descending order based on their occurrence in the training set. The training set includes data from 2021 and before (215 species), the validation set from 2022 (196 species), and the test set from 2023 (193 species).
  • ...and 5 more figures