Table of Contents
Fetching ...

FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation

Takuma Yagi, Misaki Ohashi, Yifei Huang, Ryosuke Furuta, Shungo Adachi, Toutai Mitsuyama, Yoichi Sato

TL;DR

FineBio addresses the need for accurate, reproducible documentation of biological experiments by providing a fine-grained, multi-view video dataset with hierarchical annotations across steps, atomic operations, object locations, and manipulation states. The authors collect 226 trials over 14.5 hours from 32 participants across seven protocols, yielding 3.5K steps, 50K atomic operations, and 72K bounding boxes, with frames sampled to capture challenging hand-object interactions. Baseline experiments on step segmentation, atomic operation detection, object detection, and manipulated/affected object detection reveal strong performance at higher levels but notable difficulties in boundary precision and fine-grained state reasoning, underscoring the need for multi-granularity modeling. The dataset and code, available at the project repository, aim to catalyze progress in structured activity understanding and laboratory automation while acknowledging limitations from using mock experiments and proposing future directions toward real-material datasets.

Abstract

In the development of science, accurate and reproducible documentation of the experimental process is crucial. Automatic recognition of the actions in experiments from videos would help experimenters by complementing the recording of experiments. Towards this goal, we propose FineBio, a new fine-grained video dataset of people performing biological experiments. The dataset consists of multi-view videos of 32 participants performing mock biological experiments with a total duration of 14.5 hours. One experiment forms a hierarchical structure, where a protocol consists of several steps, each further decomposed into a set of atomic operations. The uniqueness of biological experiments is that while they require strict adherence to steps described in each protocol, there is freedom in the order of atomic operations. We provide hierarchical annotation on protocols, steps, atomic operations, object locations, and their manipulation states, providing new challenges for structured activity understanding and hand-object interaction recognition. To find out challenges on activity understanding in biological experiments, we introduce baseline models and results on four different tasks, including (i) step segmentation, (ii) atomic operation detection (iii) object detection, and (iv) manipulated/affected object detection. Dataset and code are available from https://github.com/aistairc/FineBio.

FineBio: A Fine-Grained Video Dataset of Biological Experiments with Hierarchical Annotation

TL;DR

FineBio addresses the need for accurate, reproducible documentation of biological experiments by providing a fine-grained, multi-view video dataset with hierarchical annotations across steps, atomic operations, object locations, and manipulation states. The authors collect 226 trials over 14.5 hours from 32 participants across seven protocols, yielding 3.5K steps, 50K atomic operations, and 72K bounding boxes, with frames sampled to capture challenging hand-object interactions. Baseline experiments on step segmentation, atomic operation detection, object detection, and manipulated/affected object detection reveal strong performance at higher levels but notable difficulties in boundary precision and fine-grained state reasoning, underscoring the need for multi-granularity modeling. The dataset and code, available at the project repository, aim to catalyze progress in structured activity understanding and laboratory automation while acknowledging limitations from using mock experiments and proposing future directions toward real-material datasets.

Abstract

In the development of science, accurate and reproducible documentation of the experimental process is crucial. Automatic recognition of the actions in experiments from videos would help experimenters by complementing the recording of experiments. Towards this goal, we propose FineBio, a new fine-grained video dataset of people performing biological experiments. The dataset consists of multi-view videos of 32 participants performing mock biological experiments with a total duration of 14.5 hours. One experiment forms a hierarchical structure, where a protocol consists of several steps, each further decomposed into a set of atomic operations. The uniqueness of biological experiments is that while they require strict adherence to steps described in each protocol, there is freedom in the order of atomic operations. We provide hierarchical annotation on protocols, steps, atomic operations, object locations, and their manipulation states, providing new challenges for structured activity understanding and hand-object interaction recognition. To find out challenges on activity understanding in biological experiments, we introduce baseline models and results on four different tasks, including (i) step segmentation, (ii) atomic operation detection (iii) object detection, and (iv) manipulated/affected object detection. Dataset and code are available from https://github.com/aistairc/FineBio.
Paper Structure (134 sections, 2 equations, 19 figures, 10 tables)

This paper contains 134 sections, 2 equations, 19 figures, 10 tables.

Figures (19)

  • Figure 1: We propose FineBio, a fine-grained and multi-view video dataset of biological experiments with hierarchical annotation. FineBio dataset consists of multi-view videos (left), hierarchical action annotation with different temporal granularity (middle), and frame-level object annotations (right).
  • Figure 2: Distribution of duration: Average duration is 3.9 minutes, 14.3 seconds, 0.9 seconds, for protocols, steps, and atomic operations, respectively.
  • Figure 3: Distribution of verbs (left), manipulated objects (center), and affected objects (right).
  • Figure 4: Examples of object bounding box annotation and their manipulation states. Each figure shows example of object annotation for protocol 1 (lyses and recovery), 3 (DNA extraction with magnetic beads), 5 (PCR), and 6 (DNA extraction with spin columns) from left to right. Each color of box denotes object category. Hand contact states and object manipulation states (contact, manipulated_left/right, effect_left/right) are shown next to the object name.
  • Figure 5: Example annotation of single step. LH and RH denote left hand and right hand, respectively. Note that ground truth contact annotation is only provided through object location annotation against sparsely sample frames.
  • ...and 14 more figures