Table of Contents
Fetching ...

SILVI: Simple Interface for Labeling Video Interactions

Ozan Kanbertay, Richard Vogg, Elif Karakoc, Peter M. Kappeler, Claudia Fichtel, Alexander S. Ecker

TL;DR

SILVI addresses the challenge of annotating spatio-temporal interactions in wildlife video data, addressing limitations of existing tools that label either behaviors or localization but not both. It introduces an open-source, Electron-based interface that supports simultaneous action and interaction labeling, track correction, and CSV export to train computer vision models. The tool was demonstrated on red-fronted lemurs across multi-view videos, showing scalability with long video datasets and comprehensive metadata handling. By linking ethogram-based labeling with precise spatial localization, SILVI enables more robust social-behavior analyses and potential applications beyond animal studies. The open-source design invites extension to automated tracking and ID integration.

Abstract

Computer vision methods are increasingly used for the automated analysis of large volumes of video data collected through camera traps, drones, or direct observations of animals in the wild. While recent advances have focused primarily on detecting individual actions, much less work has addressed the detection and annotation of interactions -- a crucial aspect for understanding social and individualized animal behavior. Existing open-source annotation tools support either behavioral labeling without localization of individuals, or localization without the capacity to capture interactions. To bridge this gap, we present SILVI, an open-source labeling software that integrates both functionalities. SILVI enables researchers to annotate behaviors and interactions directly within video data, generating structured outputs suitable for training and validating computer vision models. By linking behavioral ecology with computer vision, SILVI facilitates the development of automated approaches for fine-grained behavioral analyses. Although developed primarily in the context of animal behavior, SILVI could be useful more broadly to annotate human interactions in other videos that require extracting dynamic scene graphs. The software, along with documentation and download instructions, is available at: https://gitlab.gwdg.de/kanbertay/interaction-labelling-app.

SILVI: Simple Interface for Labeling Video Interactions

TL;DR

SILVI addresses the challenge of annotating spatio-temporal interactions in wildlife video data, addressing limitations of existing tools that label either behaviors or localization but not both. It introduces an open-source, Electron-based interface that supports simultaneous action and interaction labeling, track correction, and CSV export to train computer vision models. The tool was demonstrated on red-fronted lemurs across multi-view videos, showing scalability with long video datasets and comprehensive metadata handling. By linking ethogram-based labeling with precise spatial localization, SILVI enables more robust social-behavior analyses and potential applications beyond animal studies. The open-source design invites extension to automated tracking and ID integration.

Abstract

Computer vision methods are increasingly used for the automated analysis of large volumes of video data collected through camera traps, drones, or direct observations of animals in the wild. While recent advances have focused primarily on detecting individual actions, much less work has addressed the detection and annotation of interactions -- a crucial aspect for understanding social and individualized animal behavior. Existing open-source annotation tools support either behavioral labeling without localization of individuals, or localization without the capacity to capture interactions. To bridge this gap, we present SILVI, an open-source labeling software that integrates both functionalities. SILVI enables researchers to annotate behaviors and interactions directly within video data, generating structured outputs suitable for training and validating computer vision models. By linking behavioral ecology with computer vision, SILVI facilitates the development of automated approaches for fine-grained behavioral analyses. Although developed primarily in the context of animal behavior, SILVI could be useful more broadly to annotate human interactions in other videos that require extracting dynamic scene graphs. The software, along with documentation and download instructions, is available at: https://gitlab.gwdg.de/kanbertay/interaction-labelling-app.

Paper Structure

This paper contains 11 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Scoring the behavior of redfronted lemurs using SILVI: Users can upload multiple video views, an ethogram, tracking and individual identification files. The app allows for annotation of actions and interactions, as well as fast correction of tracking and identification errors.
  • Figure 2: Examples of different types of interaction. Gaze can be detected on single images, while the interactions with the feeding box often require temporal context.