Table of Contents
Fetching ...

RatBodyFormer: Rat Body Surface from Keypoints

Ayaka Higami, Karin Oshima, Tomoyo Isoguchi Shiramatsu, Hirokazu Takahashi, Shohei Nobuhara, Ko Nishino

TL;DR

This work tackles the challenge of capturing the full, non-rigid surface of a rat, which encodes rich behavioral cues beyond sparse keypoints. It introduces RatDome, a multi-view capture system with bead markers to pair 3D keypoints and dense surface points, and RatBodyFormer, a transformer that regresses dense 3D body-surface coordinates from detectable keypoints onto a canonical body surface. The approach achieves about 6.5 mm surface accuracy, generalizes across rats and ages, and enables animatable avatars (GaussianRat) for analysis-by-synthesis, potentially accelerating neuroscience research by providing a robust foundation for surface-aware automated behavior analysis. Together, RatDome and RatBodyFormer offer a principled, non-invasive pathway to quantify subtle rat body surface deformations and facilitate downstream applications in VR/AR and behavioral neuroscience.

Abstract

Analyzing rat behavior lies at the heart of many scientific studies. Past methods for automated rodent modeling have focused on 3D pose estimation from keypoints, e.g., face and appendages. The pose, however, does not capture the rich body surface movement encoding the subtle rat behaviors like curling and stretching. The body surface lacks features that can be visually defined, evading these established keypoint-based methods. In this paper, we introduce the first method for reconstructing the rat body surface as a dense set of points by learning to predict it from the sparse keypoints that can be detected with past methods. Our method consists of two key contributions. The first is RatDome, a novel multi-camera system for rat behavior capture, and a large-scale dataset captured with it that consists of pairs of 3D keypoints and 3D body surface points. The second is RatBodyFormer, a novel network to transform detected keypoints to 3D body surface points. RatBodyFormer is agnostic to the exact locations of the 3D body surface points in the training data and is trained with masked-learning. We experimentally validate our framework with a number of real-world experiments. Our results collectively serve as a novel foundation for automated rat behavior analysis.

RatBodyFormer: Rat Body Surface from Keypoints

TL;DR

This work tackles the challenge of capturing the full, non-rigid surface of a rat, which encodes rich behavioral cues beyond sparse keypoints. It introduces RatDome, a multi-view capture system with bead markers to pair 3D keypoints and dense surface points, and RatBodyFormer, a transformer that regresses dense 3D body-surface coordinates from detectable keypoints onto a canonical body surface. The approach achieves about 6.5 mm surface accuracy, generalizes across rats and ages, and enables animatable avatars (GaussianRat) for analysis-by-synthesis, potentially accelerating neuroscience research by providing a robust foundation for surface-aware automated behavior analysis. Together, RatDome and RatBodyFormer offer a principled, non-invasive pathway to quantify subtle rat body surface deformations and facilitate downstream applications in VR/AR and behavioral neuroscience.

Abstract

Analyzing rat behavior lies at the heart of many scientific studies. Past methods for automated rodent modeling have focused on 3D pose estimation from keypoints, e.g., face and appendages. The pose, however, does not capture the rich body surface movement encoding the subtle rat behaviors like curling and stretching. The body surface lacks features that can be visually defined, evading these established keypoint-based methods. In this paper, we introduce the first method for reconstructing the rat body surface as a dense set of points by learning to predict it from the sparse keypoints that can be detected with past methods. Our method consists of two key contributions. The first is RatDome, a novel multi-camera system for rat behavior capture, and a large-scale dataset captured with it that consists of pairs of 3D keypoints and 3D body surface points. The second is RatBodyFormer, a novel network to transform detected keypoints to 3D body surface points. RatBodyFormer is agnostic to the exact locations of the 3D body surface points in the training data and is trained with masked-learning. We experimentally validate our framework with a number of real-world experiments. Our results collectively serve as a novel foundation for automated rat behavior analysis.

Paper Structure

This paper contains 35 sections, 1 equation, 22 figures, 3 tables.

Figures (22)

  • Figure 1: We introduce a novel multiview camera system (RatDome) to capture multiview videos of rats (top), and a novel Transfomer-based network (RatBodyFormer) that recovers the deforming body surface as a dense set of surface points (bottom: rainbow points) predicted from detectable keypoints (middle: orange points). The body surface reconstructed with RatBodyFormer offers a much richer window into the complex rat behavior.
  • Figure 2: A color-beaded rat (left) and RatDome (right). We attach color (red, black, orange, blue) beads and paint on the rat body surface . RatDome is a novel multiview camera studio for freely moving rats. It is shaped as a 15-faced gyroelongated pentagonal pyramid. With 15 cameras and their multiview geometry, we collect, annotate, and reconstruct paired sets of 3D keypoints and 3D body surface points of rats of different ages in weeks.
  • Figure 3: RatBodyFormer is an encoder-decoder Transformer model that takes the normalized displacements of detected 3D keypoints and outputs the normalized displacements of densely sampled 3D body surface points. The displacements are w.r.t. the reference pose.
  • Figure 4: Qualitative results of D1. We show the results of trained by only manually-annotated data (MA) in the left, and the results of trained by manually-annotated and semi-automatically annotated data (SAA) in the right. Semi-automatically annotated data improve the body surface estimation.
  • Figure 5: L2 error histograms of D1. Each vertical bar indicates the mean L2 error of the histogram in the same color. Our semi-automatically annotation label improve average error by about 0.9 mm. "MA" and "SAA" mean "manually-annotated data", and "semi-automatically annotated data", respectively.
  • ...and 17 more figures