Table of Contents
Fetching ...

No Free Lunch in Annotation either: An objective evaluation of foundation models for streamlining annotation in animal tracking

Emil Mededovic, Valdy Laurentius, Yuli Wu, Marcin Kopaczka, Zhu Chen, Mareike Schulz, René Tolba, Johannes Stegmaier

TL;DR

The paper investigates the reliability of foundation-model–aided annotation for long-horizon animal tracking and presents SAM-QA, a lightweight, semi-automatic workflow that fine-tunes a Segment Anything Model and enforces a quality-control loop with spatio-temporal consistency checks and SAM2-based recovery. On rat and mouse datasets, SAM-QA delivers the strongest automated-label performance among the tested methods, narrowing the gap to manual annotations but not matching them yet. The findings underscore the need for careful integration of automated annotations with targeted human oversight to maintain tracking accuracy, and they identify SAM-2V as a promising avenue for future improvements through finer-tuning and tighter quality integration. Overall, the work highlights practical pathways to accelerate annotation while preserving data quality for robust animal-tracking models.

Abstract

We analyze the capabilities of foundation models addressing the tedious task of generating annotations for animal tracking. Annotating a large amount of data is vital and can be a make-or-break factor for the robustness of a tracking model. Robustness is particularly crucial in animal tracking, as accurate tracking over long time horizons is essential for capturing the behavior of animals. However, generating additional annotations using foundation models can be counterproductive, as the quality of the annotations is just as important. Poorly annotated data can introduce noise and inaccuracies, ultimately compromising the performance and accuracy of the trained model. Over-reliance on automated annotations without ensuring precision can lead to diminished results, making careful oversight and quality control essential in the annotation process. Ultimately, we demonstrate that a thoughtful combination of automated annotations and manually annotated data is a valuable strategy, yielding an IDF1 score of 80.8 against blind usage of SAM2 video with an IDF1 score of 65.6.

No Free Lunch in Annotation either: An objective evaluation of foundation models for streamlining annotation in animal tracking

TL;DR

The paper investigates the reliability of foundation-model–aided annotation for long-horizon animal tracking and presents SAM-QA, a lightweight, semi-automatic workflow that fine-tunes a Segment Anything Model and enforces a quality-control loop with spatio-temporal consistency checks and SAM2-based recovery. On rat and mouse datasets, SAM-QA delivers the strongest automated-label performance among the tested methods, narrowing the gap to manual annotations but not matching them yet. The findings underscore the need for careful integration of automated annotations with targeted human oversight to maintain tracking accuracy, and they identify SAM-2V as a promising avenue for future improvements through finer-tuning and tighter quality integration. Overall, the work highlights practical pathways to accelerate annotation while preserving data quality for robust animal-tracking models.

Abstract

We analyze the capabilities of foundation models addressing the tedious task of generating annotations for animal tracking. Annotating a large amount of data is vital and can be a make-or-break factor for the robustness of a tracking model. Robustness is particularly crucial in animal tracking, as accurate tracking over long time horizons is essential for capturing the behavior of animals. However, generating additional annotations using foundation models can be counterproductive, as the quality of the annotations is just as important. Poorly annotated data can introduce noise and inaccuracies, ultimately compromising the performance and accuracy of the trained model. Over-reliance on automated annotations without ensuring precision can lead to diminished results, making careful oversight and quality control essential in the annotation process. Ultimately, we demonstrate that a thoughtful combination of automated annotations and manually annotated data is a valuable strategy, yielding an IDF1 score of 80.8 against blind usage of SAM2 video with an IDF1 score of 65.6.

Paper Structure

This paper contains 6 sections, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: The SAM-QA approach begins with the initialization step, where manual bounding box prompts are provided by the user. These prompts then enter an iterative loop where a fine-tuned and distilled SAM model generates segmentation masks. In the validation step, if the prompts are neither manual nor initial, an association check is performed to verify spatio-temporal consistency between the current and previous time steps. If these criteria are not met, a recovery attempt is made using SAM2 ravi2024sam, which leverages equidistant grid-sampled point prompts. Should this recovery step also fail, the user is prompted to manually re-initialize. Finally, bounding boxes are generated from the validated masks, adjusted to account for rodent movement, and prepared for use in the subsequent time step.
  • Figure 2: Illustration of the segmentation process using the watershed method: First, the image is passed through the segmentation model, and logits are used to identify peaks, which serve as seed points for watershed-based instance segmentation. Further refinement through clustering is optional.
  • Figure 3: SAM2 video analysis for different prompt interval is presented. Prompt interval refers to the interval at which frames are manually annotated.