Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution

Marga Don; Stijn Pinson; Blanca Guillen Cebrian; Yuki M. Asano

Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution

Marga Don, Stijn Pinson, Blanca Guillen Cebrian, Yuki M. Asano

TL;DR

It is seen that finetuned models consistently outperform the FMs tested, even in cases were data is scarce, even in cases were data is scarce.

Abstract

Foundation models (FMs) are a popular topic of research in AI. Their ability to generalize to new tasks and datasets without retraining or needing an abundance of data makes them an appealing candidate for applications on specialist datasets. In this work, we compare the performance of FMs to finetuned pre-trained supervised models in the task of semantic segmentation on an entirely new dataset. We see that finetuned models consistently outperform the FMs tested, even in cases were data is scarce. We release the code and dataset for this work on GitHub.

Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution

TL;DR

It is seen that finetuned models consistently outperform the FMs tested, even in cases were data is scarce, even in cases were data is scarce.

Abstract

Paper Structure (33 sections, 2 equations, 13 figures, 8 tables)

This paper contains 33 sections, 2 equations, 13 figures, 8 tables.

Introduction
Related Work
Segmentation
Trash Detection in Water
Comparing Foundation and Finetuning models
Dataset
The Ocean Cleanup
Methods
RandomForest
PerSAM
Multiple prompts
PerSAM-F
SegGPT
YOLOv8
Metrics
...and 18 more sections

Figures (13)

Figure 1: Example images from the 6 locations in the dataset (upper) with ground truth annotations (lower). Yellow denotes in-system trash, pink denotes out-system trash, light blue denotes water and dark blue denotes the barrier.
Figure 2: Example performance of all 4 models using an image from Location 2
Figure 3: BMC vs mIoU-In% for SegGPT on all locations, with constant prompt image and mask.
Figure 4: MFS vs mIoU-In% for PerSAM-F on all locations, with constant prompt image and mask.
Figure 5: Performance of YOLOv8 model finetuned on different sizes of training data. Models were all evaluated on the same 20% test split.
...and 8 more figures

Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution

TL;DR

Abstract

Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution

Authors

TL;DR

Abstract

Table of Contents

Figures (13)