Table of Contents
Fetching ...

Assessing Foundation Models for Mold Colony Detection with Limited Training Data

Henrik Pichler, Janis Keuper, Matthew Copping

TL;DR

This work tackles automated mold colony counting on Petri dishes as a key component of indoor air quality monitoring. It benchmarks three foundation-model variants (MaskDINO, SAM-2, RF-DETR) against traditional baselines across high-, few-, and low-data regimes on a 5,000-image dataset, using task-specific metrics such as $AP_{mask}$, $AP_{box}$, $CA$, $CA@10$, and $MAPE$. The results show that MaskDINO-Swin achieves strong counting performance with as few as 150 labeled images (e.g., $CA@10$ ≈ 72.6%), approaching the performance of models trained on the full dataset, while YoloV9 remains competitive with extensive data. The study demonstrates the practical value of data-efficient foundation models for rapid deployment and iterative improvement in niche microbiology tasks, and it outlines limitations and directions for future work, including colony differentiation and domain-specific prompts like BiomedParse.

Abstract

The process of quantifying mold colonies on Petri dish samples is of critical importance for the assessment of indoor air quality, as high colony counts can indicate potential health risks and deficiencies in ventilation systems. Conventionally the automation of such a labor-intensive process, as well as other tasks in microbiology, relies on the manual annotation of large datasets and the subsequent extensive training of models like YoloV9. To demonstrate that exhaustive annotation is not a prerequisite anymore when tackling a new vision task, we compile a representative dataset of 5000 Petri dish images annotated with bounding boxes, simulating both a traditional data collection approach as well as few-shot and low-shot scenarios with well curated subsets with instance level masks. We benchmark three vision foundation models against traditional baselines on task specific metrics, reflecting realistic real-world requirements. Notably, MaskDINO attains near-parity with an extensively trained YoloV9 model while finetuned only on 150 images, retaining competitive performance with as few as 25 images, still being reliable on $\approx$ 70% of the samples. Our results show that data-efficient foundation models can match traditional approaches with only a fraction of the required data, enabling earlier development and faster iterative improvement of automated microbiological systems with a superior upper-bound performance than traditional models would achieve.

Assessing Foundation Models for Mold Colony Detection with Limited Training Data

TL;DR

This work tackles automated mold colony counting on Petri dishes as a key component of indoor air quality monitoring. It benchmarks three foundation-model variants (MaskDINO, SAM-2, RF-DETR) against traditional baselines across high-, few-, and low-data regimes on a 5,000-image dataset, using task-specific metrics such as , , , , and . The results show that MaskDINO-Swin achieves strong counting performance with as few as 150 labeled images (e.g., ≈ 72.6%), approaching the performance of models trained on the full dataset, while YoloV9 remains competitive with extensive data. The study demonstrates the practical value of data-efficient foundation models for rapid deployment and iterative improvement in niche microbiology tasks, and it outlines limitations and directions for future work, including colony differentiation and domain-specific prompts like BiomedParse.

Abstract

The process of quantifying mold colonies on Petri dish samples is of critical importance for the assessment of indoor air quality, as high colony counts can indicate potential health risks and deficiencies in ventilation systems. Conventionally the automation of such a labor-intensive process, as well as other tasks in microbiology, relies on the manual annotation of large datasets and the subsequent extensive training of models like YoloV9. To demonstrate that exhaustive annotation is not a prerequisite anymore when tackling a new vision task, we compile a representative dataset of 5000 Petri dish images annotated with bounding boxes, simulating both a traditional data collection approach as well as few-shot and low-shot scenarios with well curated subsets with instance level masks. We benchmark three vision foundation models against traditional baselines on task specific metrics, reflecting realistic real-world requirements. Notably, MaskDINO attains near-parity with an extensively trained YoloV9 model while finetuned only on 150 images, retaining competitive performance with as few as 25 images, still being reliable on 70% of the samples. Our results show that data-efficient foundation models can match traditional approaches with only a fraction of the required data, enabling earlier development and faster iterative improvement of automated microbiological systems with a superior upper-bound performance than traditional models would achieve.

Paper Structure

This paper contains 24 sections, 3 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Comparison between different mold colony growth patterns. (a) A typical sample with well-distinguishable colonies, (b) well-distinguishable but overlapping colonies on a darker background (caused by a changed backdrop throughout the capturing process), and (c) many small colonies, presenting a more challenging scenario.
  • Figure 2: Comparison of predictions made by YoloV9 (left), RF-DETR (middle) and MaskDINO-SWIN (right) trained on different amounts of data. The images show a typical sample with overlapping and small colonies. Additional comparison images can be found in the supplementary material. Images best viewed in color.