Table of Contents
Fetching ...

Learning Keypoints for Multi-Agent Behavior Analysis using Self-Supervision

Daniel Khalil, Christina Liu, Pietro Perona, Jennifer J. Sun, Markus Marks

TL;DR

B-KinD-multi is introduced, a novel approach that leverages pre-trained video segmentation models to guide keypoint discovery in multi-agent scenarios and eliminates the need for time-consuming manual annotations on new experimental settings and organisms.

Abstract

The study of social interactions and collective behaviors through multi-agent video analysis is crucial in biology. While self-supervised keypoint discovery has emerged as a promising solution to reduce the need for manual keypoint annotations, existing methods often struggle with videos containing multiple interacting agents, especially those of the same species and color. To address this, we introduce B-KinD-multi, a novel approach that leverages pre-trained video segmentation models to guide keypoint discovery in multi-agent scenarios. This eliminates the need for time-consuming manual annotations on new experimental settings and organisms. Extensive evaluations demonstrate improved keypoint regression and downstream behavioral classification in videos of flies, mice, and rats. Furthermore, our method generalizes well to other species, including ants, bees, and humans, highlighting its potential for broad applications in automated keypoint annotation for multi-agent behavior analysis. Code available under: https://danielpkhalil.github.io/B-KinD-Multi

Learning Keypoints for Multi-Agent Behavior Analysis using Self-Supervision

TL;DR

B-KinD-multi is introduced, a novel approach that leverages pre-trained video segmentation models to guide keypoint discovery in multi-agent scenarios and eliminates the need for time-consuming manual annotations on new experimental settings and organisms.

Abstract

The study of social interactions and collective behaviors through multi-agent video analysis is crucial in biology. While self-supervised keypoint discovery has emerged as a promising solution to reduce the need for manual keypoint annotations, existing methods often struggle with videos containing multiple interacting agents, especially those of the same species and color. To address this, we introduce B-KinD-multi, a novel approach that leverages pre-trained video segmentation models to guide keypoint discovery in multi-agent scenarios. This eliminates the need for time-consuming manual annotations on new experimental settings and organisms. Extensive evaluations demonstrate improved keypoint regression and downstream behavioral classification in videos of flies, mice, and rats. Furthermore, our method generalizes well to other species, including ants, bees, and humans, highlighting its potential for broad applications in automated keypoint annotation for multi-agent behavior analysis. Code available under: https://danielpkhalil.github.io/B-KinD-Multi
Paper Structure (23 sections, 5 equations, 4 figures, 3 tables)

This paper contains 23 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of B-KinD-multi: We present B-KinD-multi a self-supervised keypoint discovery method for multiple agents. Previous methods struggle with discovering key points on multiple agents that look very similar. Our method addresses this issue by using video segmentation to mask feature maps during keypoint discovery. Our method enables accurate keypoint discovery and downstream behavioral classification.
  • Figure 2: Full overview of B-KinD-multi: Video frames $I_t$ and $I_{t+T}$ at times $t$ and $t+T$ are processed through a video segmentation module to obtain agent masks. The raw frames are then fed into an appearance encoder $\Phi$, where appearance features are extracted from $I_t$, and a pose decoder $\Psi$. The agent masks are then separated, down-sampled, and used to produce keypoints per agent from both $I_t$ and $I_{t+T}$. These keypoints are then used as input to reconstruct the spatiotemporal difference computed from the two frames for each agent independently. This multi-agent approach allows for simultaneous keypoint discovery on multiple interacting subjects.
  • Figure 3: Qualitative Comparisons between B-KinD and B-KinD-multi. We compare the keypoint discovery performance qualitatively between B-KinD and B-KinD-multi on Fly v. Fly, CalMS21 and PAIR-R24. We find that for each dataset B-KinD-multi performs much better than B-KinD. The difference is bigger for datasets where the agents have the same color (Fly v. Fly and PAIR-R24) compared to when they are different (CalMS21).
  • Figure 4: Qualitative results of B-KinD-multi on other datasets. We demonstrate qualitatively the performance of B-KinD-multi on other multi-agent datasets. From left to right: 4 mice, 10 ants, 2 bees, 7 humans.