Table of Contents
Fetching ...

A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images

László Kopácsi, Áron Fóthi, András Lőrincz

TL;DR

The paper tackles the problem of body-part segmentation and keypoint detection for rats under heavy occlusion without relying on manual annotations. It introduces a self-supervised pipeline that first generates automatic annotations from a stationary-camera video using foreground-background segmentation, medial-axis-based features, and watershed-based segmentation, followed by extensive augmentation to simulate occlusions. Two Mask R-CNN-based models are trained on the generated labels to perform instance segmentation, keypoint detection, and body-part segmentation, achieving substantial improvements over a CV-based baseline (e.g., from APs of 53.22/48.91/9.38 to 61.92/77.53/28.87) and demonstrating robustness to occlusions. The work offers practical impact for automated animal-behavior analysis and provides directions for extending to video-based tracking and more advanced architectures like DETR.

Abstract

Recognition of individual components and keypoint detection supported by instance segmentation is crucial to analyze the behavior of agents on the scene. Such systems could be used for surveillance, self-driving cars, and also for medical research, where behavior analysis of laboratory animals is used to confirm the aftereffects of a given medicine. A method capable of solving the aforementioned tasks usually requires a large amount of high-quality hand-annotated data, which takes time and money to produce. In this paper, we propose a method that alleviates the need for manual labeling of laboratory rats. To do so, first, we generate initial annotations with a computer vision-based approach, then through extensive augmentation, we train a deep neural network on the generated data. The final system is capable of instance segmentation, keypoint detection, and body part segmentation even when the objects are heavily occluded.

A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images

TL;DR

The paper tackles the problem of body-part segmentation and keypoint detection for rats under heavy occlusion without relying on manual annotations. It introduces a self-supervised pipeline that first generates automatic annotations from a stationary-camera video using foreground-background segmentation, medial-axis-based features, and watershed-based segmentation, followed by extensive augmentation to simulate occlusions. Two Mask R-CNN-based models are trained on the generated labels to perform instance segmentation, keypoint detection, and body-part segmentation, achieving substantial improvements over a CV-based baseline (e.g., from APs of 53.22/48.91/9.38 to 61.92/77.53/28.87) and demonstrating robustness to occlusions. The work offers practical impact for automated animal-behavior analysis and provides directions for extending to video-based tracking and more advanced architectures like DETR.

Abstract

Recognition of individual components and keypoint detection supported by instance segmentation is crucial to analyze the behavior of agents on the scene. Such systems could be used for surveillance, self-driving cars, and also for medical research, where behavior analysis of laboratory animals is used to confirm the aftereffects of a given medicine. A method capable of solving the aforementioned tasks usually requires a large amount of high-quality hand-annotated data, which takes time and money to produce. In this paper, we propose a method that alleviates the need for manual labeling of laboratory rats. To do so, first, we generate initial annotations with a computer vision-based approach, then through extensive augmentation, we train a deep neural network on the generated data. The final system is capable of instance segmentation, keypoint detection, and body part segmentation even when the objects are heavily occluded.
Paper Structure (21 sections, 4 equations, 5 figures, 3 tables)

This paper contains 21 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: An example, where one of the rats is mounting the other one. The rats are highly similar and may heavily occlude each other from time to time giving rise to ambiguity in the instance segmentation and keypoint detection tasks due to hidden or heavily occluded regions.
  • Figure 2: Pipeline of the automatic annotation method. The process starts with foreground-background segmentation; then, after pre-processing, medial axis transform is applied. From the endpoints of the midline, head and end of tail keypoints can be determined based on the area of their watershed segmentation. Then the base of the tail keypoint can be found by taking the median value of the distance transform. Finally, the body part segmentation is given by the watershed algorithm initiated from the acquired keypoints. Best viewed in color.
  • Figure 3: Result of the CV-based automatic annotation method. In the first row the input image and its foreground-background segmentation, and in the second row the annotated instances can be seen.
  • Figure 4: Mask R-CNN architecture. The network takes an RGB image, feeds it through the backbone, which extracts a feature pyramid, where each feature map has half the spatial resolution as the one before. Then the RPN makes bounding box proposals, which are processed by the ROI heads.
  • Figure 5: Result of the final method. The left side of each image shows the result of the keypoint detection and instance segmentation model, and the right side shows the result of the body part segmentation model. The columns address the different amount of occlusion present on the scene. If there is no occlusion $(a)$ or just partial occlusion $(b)$ is present on the scene, both methods perform well. If the objects are heavily occluded $(c)$, the quality of the keypoint model starts degrading. After a point, $(d)$, it cannot separate instances due to limitations of the Mask R-CNN architecture, but the body part model can still segment the objects.