Table of Contents
Fetching ...

Tracking Virtual Meetings in the Wild: Re-identification in Multi-Participant Virtual Meetings

Oriel Perl, Ido Leshem, Uria Franko, Yuval Goldman

TL;DR

This paper introduces a novel approach to track and re-identify participants in remote video meetings, by utilizing the spatio-temporal priors arising from the data in the domain, which increases tracking capabilities compared to the use of general object tracking.

Abstract

In recent years, workplaces and educational institutes have widely adopted virtual meeting platforms. This has led to a growing interest in analyzing and extracting insights from these meetings, which requires effective detection and tracking of unique individuals. In practice, there is no standardization in video meetings recording layout, and how they are captured across the different platforms and services. This, in turn, creates a challenge in acquiring this data stream and analyzing it in a uniform fashion. Our approach provides a solution to the most general form of video recording, usually consisting of a grid of participants (\cref{fig:videomeeting}) from a single video source with no metadata on participant locations, while using the least amount of constraints and assumptions as to how the data was acquired. Conventional approaches often use YOLO models coupled with tracking algorithms, assuming linear motion trajectories akin to that observed in CCTV footage. However, such assumptions fall short in virtual meetings, where participant video feed window can abruptly change location across the grid. In an organic video meeting setting, participants frequently join and leave, leading to sudden, non-linear movements on the video grid. This disrupts optical flow-based tracking methods that depend on linear motion. Consequently, standard object detection and tracking methods might mistakenly assign multiple participants to the same tracker. In this paper, we introduce a novel approach to track and re-identify participants in remote video meetings, by utilizing the spatio-temporal priors arising from the data in our domain. This, in turn, increases tracking capabilities compared to the use of general object tracking. Our approach reduces the error rate by 95% on average compared to YOLO-based tracking methods as a baseline.

Tracking Virtual Meetings in the Wild: Re-identification in Multi-Participant Virtual Meetings

TL;DR

This paper introduces a novel approach to track and re-identify participants in remote video meetings, by utilizing the spatio-temporal priors arising from the data in the domain, which increases tracking capabilities compared to the use of general object tracking.

Abstract

In recent years, workplaces and educational institutes have widely adopted virtual meeting platforms. This has led to a growing interest in analyzing and extracting insights from these meetings, which requires effective detection and tracking of unique individuals. In practice, there is no standardization in video meetings recording layout, and how they are captured across the different platforms and services. This, in turn, creates a challenge in acquiring this data stream and analyzing it in a uniform fashion. Our approach provides a solution to the most general form of video recording, usually consisting of a grid of participants (\cref{fig:videomeeting}) from a single video source with no metadata on participant locations, while using the least amount of constraints and assumptions as to how the data was acquired. Conventional approaches often use YOLO models coupled with tracking algorithms, assuming linear motion trajectories akin to that observed in CCTV footage. However, such assumptions fall short in virtual meetings, where participant video feed window can abruptly change location across the grid. In an organic video meeting setting, participants frequently join and leave, leading to sudden, non-linear movements on the video grid. This disrupts optical flow-based tracking methods that depend on linear motion. Consequently, standard object detection and tracking methods might mistakenly assign multiple participants to the same tracker. In this paper, we introduce a novel approach to track and re-identify participants in remote video meetings, by utilizing the spatio-temporal priors arising from the data in our domain. This, in turn, increases tracking capabilities compared to the use of general object tracking. Our approach reduces the error rate by 95% on average compared to YOLO-based tracking methods as a baseline.
Paper Structure (29 sections, 7 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 7 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 2: Our remote meeting video analysis pipeline: 1) Split video into frames. 2) Run parallel processes for object detection/tracking and face detection/embedding on all frames. 3) Match track IDs with detected faces. 4) Segment video into scenes based on track ID changes. 5) Create latent representations for each track within scenes using embedding vectors. 6) Merge representations across scenes to form full track IDs for meeting participants.
  • Figure 3: Mean absolute error (MAE) comparing between the predicted number of participants and the actual number of participants (GT)
  • Figure 4: Mean absolute error (MAE) comparing the predicted number of participants and actual number of participants according to meeting duration in minutes.