Table of Contents
Fetching ...

Interaction-via-Actions: Cattle Interaction Detection with Joint Learning of Action-Interaction Latent Space

Ren Nakagawa, Yang Yang, Risa Shinoda, Hiroaki Santo, Kenji Oyama, Fumio Okura, Takenao Ohkawa

TL;DR

<3-5 sentence high-level summary> Addresses automatic cattle interaction detection from single images despite scarce interaction data. Introduces CattleAct, which decomposes interactions into two individual actions and learns a joint action–interaction latent space using pre-training, contrastive fine-tuning, and alignment losses, supplemented by skeleton-aware augmentation. Proposes a practical multimodal system integrating video and GPS for production pastures and demonstrates improved interaction recognition over baselines in a real-world dataset. Shows that aligning action and interaction representations enhances robustness to occlusion and enables scalable, cost-effective livestock monitoring with potential estrus detection benefits.

Abstract

This paper introduces a method and application for automatically detecting behavioral interactions between grazing cattle from a single image, which is essential for smart livestock management in the cattle industry, such as for detecting estrus. Although interaction detection for humans has been actively studied, a non-trivial challenge lies in cattle interaction detection, specifically the lack of a comprehensive behavioral dataset that includes interactions, as the interactions of grazing cattle are rare events. We, therefore, propose CattleAct, a data-efficient method for interaction detection by decomposing interactions into the combinations of actions by individual cattle. Specifically, we first learn an action latent space from a large-scale cattle action dataset. Then, we embed rare interactions via the fine-tuning of the pre-trained latent space using contrastive learning, thereby constructing a unified latent space of actions and interactions. On top of the proposed method, we develop a practical working system integrating video and GPS inputs. Experiments on a commercial-scale pasture demonstrate the accurate interaction detection achieved by our method compared to the baselines. Our implementation is available at https://github.com/rakawanegan/CattleAct.

Interaction-via-Actions: Cattle Interaction Detection with Joint Learning of Action-Interaction Latent Space

TL;DR

<3-5 sentence high-level summary> Addresses automatic cattle interaction detection from single images despite scarce interaction data. Introduces CattleAct, which decomposes interactions into two individual actions and learns a joint action–interaction latent space using pre-training, contrastive fine-tuning, and alignment losses, supplemented by skeleton-aware augmentation. Proposes a practical multimodal system integrating video and GPS for production pastures and demonstrates improved interaction recognition over baselines in a real-world dataset. Shows that aligning action and interaction representations enhances robustness to occlusion and enables scalable, cost-effective livestock monitoring with potential estrus detection benefits.

Abstract

This paper introduces a method and application for automatically detecting behavioral interactions between grazing cattle from a single image, which is essential for smart livestock management in the cattle industry, such as for detecting estrus. Although interaction detection for humans has been actively studied, a non-trivial challenge lies in cattle interaction detection, specifically the lack of a comprehensive behavioral dataset that includes interactions, as the interactions of grazing cattle are rare events. We, therefore, propose CattleAct, a data-efficient method for interaction detection by decomposing interactions into the combinations of actions by individual cattle. Specifically, we first learn an action latent space from a large-scale cattle action dataset. Then, we embed rare interactions via the fine-tuning of the pre-trained latent space using contrastive learning, thereby constructing a unified latent space of actions and interactions. On top of the proposed method, we develop a practical working system integrating video and GPS inputs. Experiments on a commercial-scale pasture demonstrate the accurate interaction detection achieved by our method compared to the baselines. Our implementation is available at https://github.com/rakawanegan/CattleAct.

Paper Structure

This paper contains 33 sections, 8 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Joint estimation of actions and interactions of cows. Given a single-image observation, we estimate both frequent actions (e.g., standing, shown in blue) by individual cows and rare interactions (e.g., mount, shown in purple) by two cows, leveraging the joint latent space of actions and interactions.
  • Figure 2: Overview of CattleAct. (a) System overview: detected and tracked cows are re-identified, split into individuals and interaction candidates, and encoded by the action and interaction encoders, whose features are aligned by a multi-head attention module for final action–interaction classification. (b) GPS–image matching-based re-identification aligns GPS trajectories with detection trajectories to maintain consistent cattle IDs. (c) Contrastive-based action–interaction feature alignment pulls features of interacting pairs together while pushing apart negative and no interaction pairs in the joint latent space.
  • Figure 3: Skeleton-aware data augmentation. Skeleton-aware cutout enhances robustness to occlusion while selectively preserving joints vital for recognition, i.e., head and front legs for individual action recognition, and the head and torso for interaction recognition.
  • Figure 4: Visual comparisons. Our method accurately detects both actions and interactions from real-world pasture images, compared to baseline methods.
  • Figure 5: t-SNE visualization of the feature space. Each point represents a sample, classified by markers (circles: individual action, triangles: interactions), and colors are class labels.
  • ...and 2 more figures