Table of Contents
Fetching ...

Activity-Biometrics: Person Identification from Daily Activities

Shehreen Azad, Yogesh Singh Rawat

TL;DR

This work tackles person identification from RGB videos of daily activities, where traditional appearance cues can be unreliable due to biases and privacy-preserving constraints. It introduces ABNet, which disentangles biometric cues from non-biometric appearance features using a bias-less teacher for biometric distillation and a distortion network to model appearance bias, while jointly leveraging activity information to improve biometrics through an activity prior. The method is evaluated on five datasets derived from activity recognition benchmarks, showing consistent improvements over state-of-the-art image- and video-based methods and robustness to hue shifts and face blurring. Key contributions include the bias-disentangled architecture, cross-modal knowledge transfer via silhouette-based distillation, and the integration of activity priors to enhance identity discrimination, validated by extensive ablations and analysis. The approach promises practical impact for privacy-aware, activity-driven biometrics in security and surveillance contexts, with public availability of code and datasets.

Abstract

In this work, we study a novel problem which focuses on person identification while performing daily activities. Learning biometric features from RGB videos is challenging due to spatio-temporal complexity and presence of appearance biases such as clothing color and background. We propose ABNet, a novel framework which leverages disentanglement of biometric and non-biometric features to perform effective person identification from daily activities. ABNet relies on a bias-less teacher to learn biometric features from RGB videos and explicitly disentangle non-biometric features with the help of biometric distortion. In addition, ABNet also exploits activity prior for biometrics which is enabled by joint biometric and activity learning. We perform comprehensive evaluation of the proposed approach across five different datasets which are derived from existing activity recognition benchmarks. Furthermore, we extensively compare ABNet with existing works in person identification and demonstrate its effectiveness for activity-based biometrics across all five datasets. The code and dataset can be accessed at: \url{https://github.com/sacrcv/Activity-Biometrics/}

Activity-Biometrics: Person Identification from Daily Activities

TL;DR

This work tackles person identification from RGB videos of daily activities, where traditional appearance cues can be unreliable due to biases and privacy-preserving constraints. It introduces ABNet, which disentangles biometric cues from non-biometric appearance features using a bias-less teacher for biometric distillation and a distortion network to model appearance bias, while jointly leveraging activity information to improve biometrics through an activity prior. The method is evaluated on five datasets derived from activity recognition benchmarks, showing consistent improvements over state-of-the-art image- and video-based methods and robustness to hue shifts and face blurring. Key contributions include the bias-disentangled architecture, cross-modal knowledge transfer via silhouette-based distillation, and the integration of activity priors to enhance identity discrimination, validated by extensive ablations and analysis. The approach promises practical impact for privacy-aware, activity-driven biometrics in security and surveillance contexts, with public availability of code and datasets.

Abstract

In this work, we study a novel problem which focuses on person identification while performing daily activities. Learning biometric features from RGB videos is challenging due to spatio-temporal complexity and presence of appearance biases such as clothing color and background. We propose ABNet, a novel framework which leverages disentanglement of biometric and non-biometric features to perform effective person identification from daily activities. ABNet relies on a bias-less teacher to learn biometric features from RGB videos and explicitly disentangle non-biometric features with the help of biometric distortion. In addition, ABNet also exploits activity prior for biometrics which is enabled by joint biometric and activity learning. We perform comprehensive evaluation of the proposed approach across five different datasets which are derived from existing activity recognition benchmarks. Furthermore, we extensively compare ABNet with existing works in person identification and demonstrate its effectiveness for activity-based biometrics across all five datasets. The code and dataset can be accessed at: \url{https://github.com/sacrcv/Activity-Biometrics/}
Paper Structure (16 sections, 6 equations, 12 figures, 15 tables)

This paper contains 16 sections, 6 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Different approaches for person identification:(left) samples for existing person identification problems such as face recognition (top: Celeb-Aliu2015faceattributes), whole body recognition (middle: Market-1501zheng2015scalable), and gait recognition (bottom: CASIA-Byu2006framework). (right) we focus on person identification from daily activities which presents more challenges beyond learning walking or facial patterns. We show some samples from datasets we used to study this problem; (top: NTU RGB-AB, middle: Charades-AB, bottom: ACC-MM1-Activities).
  • Figure 1: Performance analysis w/ and w/o activity prior; bars represent biometrics rank 1 and dots represent activity accuracy.
  • Figure 2: Overview of our proposed method ABNet. RGB video is passed to a video encoder $S_\varphi (\cdot)$ for spatio-temporal feature $F_{AB}$ extraction which is passed to the activity head $C^A$ and the actor head $C^B$. $C^B$ captures both biometrics (in red) and appearance (in green) features in $F_{BT}$. To disentangle features, bias-less teacher encoder $T_\theta(\cdot)$ distills biometrics knowledge from corresponding silhouettes. The appearance feature bias is learned via a distortion network using encoder $A_\varphi (\cdot)$ on the distorted video input. Similar to $C^B$, $C^{DB}$ also captures both distorted biometrics (in red) and distorted appearance (in green) features in $F^D_{BT}$. Here, green and red denote positive and negative feature. Joint training is performed using both $C^A$ and $C^B$. During inference, only the dashed box highlighted branch is utilized.
  • Figure 2: Effect of distortion amount Original sample zoomed in to show effect of $\alpha=50,100,150$ (top) and $\alpha=200,250,300$ (bottom). As $\alpha$ increases, the distortion keeps increasing.
  • Figure 3: Biometrics distortion: here original samples are shown in the top row and their corresponding distorted samples in the bottom row. From left to right, every two columns contain samples from NTU RGB-AB, PKU MMD-AB, Charades-AB, ACC-MM1-Activities and BRIAR-BGC3 dataset respectively. The subjects from BRIAR-BGC3 and ACC-MM1-Activities consented to publication.
  • ...and 7 more figures