Table of Contents
Fetching ...

Sim2real Cattle Joint Estimation in 3D point clouds

Mohammad Okour, Raphael Falque, Alen Alempijevic

TL;DR

Inspired by the established allometric relationship between bone length and the overall height of mammals, the estimated joints are utilised to predict hip height within a real cattle dataset, extending the utility of the approach to offer insights into improving cattle monitoring practices.

Abstract

Understanding the well-being of cattle is crucial in various agricultural contexts. Cattle's body shape and joint articulation carry significant information about their welfare, yet acquiring comprehensive datasets for 3D body pose estimation presents a formidable challenge. This study delves into the construction of such a dataset specifically tailored for cattle. Leveraging the expertise of digital artists, we use a single animated 3D model to represent diverse cattle postures. To address the disparity between virtual and real-world data, we augment the 3D model's shape to encompass a range of potential body appearances, thereby narrowing the "sim2real" gap. We use these annotated models to train a deep-learning framework capable of estimating internal joints solely based on external surface curvature. Our contribution is specifically the use of geodesic distance over the surface manifold, coupled with multilateration to extract joints in a semantic keypoint detection encoder-decoder architecture. We demonstrate the robustness of joint extraction by comparing the link lengths extracted on real cattle mobbing and walking within a race. Furthermore, inspired by the established allometric relationship between bone length and the overall height of mammals, we utilise the estimated joints to predict hip height within a real cattle dataset, extending the utility of our approach to offer insights into improving cattle monitoring practices.

Sim2real Cattle Joint Estimation in 3D point clouds

TL;DR

Inspired by the established allometric relationship between bone length and the overall height of mammals, the estimated joints are utilised to predict hip height within a real cattle dataset, extending the utility of the approach to offer insights into improving cattle monitoring practices.

Abstract

Understanding the well-being of cattle is crucial in various agricultural contexts. Cattle's body shape and joint articulation carry significant information about their welfare, yet acquiring comprehensive datasets for 3D body pose estimation presents a formidable challenge. This study delves into the construction of such a dataset specifically tailored for cattle. Leveraging the expertise of digital artists, we use a single animated 3D model to represent diverse cattle postures. To address the disparity between virtual and real-world data, we augment the 3D model's shape to encompass a range of potential body appearances, thereby narrowing the "sim2real" gap. We use these annotated models to train a deep-learning framework capable of estimating internal joints solely based on external surface curvature. Our contribution is specifically the use of geodesic distance over the surface manifold, coupled with multilateration to extract joints in a semantic keypoint detection encoder-decoder architecture. We demonstrate the robustness of joint extraction by comparing the link lengths extracted on real cattle mobbing and walking within a race. Furthermore, inspired by the established allometric relationship between bone length and the overall height of mammals, we utilise the estimated joints to predict hip height within a real cattle dataset, extending the utility of our approach to offer insights into improving cattle monitoring practices.

Paper Structure

This paper contains 6 sections, 6 equations, 9 figures.

Figures (9)

  • Figure 1: Each model is annotated by twelve joints: two joints at each of the front legs, two at each of the back legs, two at each side of the hip bones, and two at either end of the spine.
  • Figure 2: Method overview: from the simulated model, the armature undergoes rigid scaling and meshes a non-rigid deformation. Through raycasting over a number of cameras, several point clouds are generated and merged. At inference time, the merged point cloud is passed into an encoder-decoder architecture (Pointnet++ qi2017pointnet++) to extract the keypoints. During training, the dataset uses keypoints from the armature and the distances on the manifold are pre-computed. The encoder-decoder inputs are $n\times3$ points, and the outputs are the $n\times13$ distances to the $13$ joints keypoints.
  • Figure 3: Top: Cattle skeleton from ukyBeefCattle containing information of all the joints and bones. Bottom: The annotated model used in this work containing rigging and joints indicated by blue squares
  • Figure 4: Barycetnric diagram of $\alpha$, $\beta$, and $\gamma$. Where $D_{g1}$, $D_{g2}$, and $D_{g3}$ are the heat kernel distances to a point in space.
  • Figure 5: Predicted distance on the manifold. Points coloured in blue represent the nearest points to a joint. Left the rear leg and right the front leg are being evaluated.
  • ...and 4 more figures