Table of Contents
Fetching ...

PrivHAR-Bench: A Graduated Privacy Benchmark Dataset for Video-Based Action Recognition

Samar Ansari

Abstract

Existing research on privacy-preserving Human Activity Recognition (HAR) typically evaluates methods against a binary paradigm: clear video versus a single privacy transformation. This limits cross-method comparability and obscures the nuanced relationship between privacy strength and recognition utility. We introduce \textit{PrivHAR-Bench}, a multi-tier benchmark dataset designed to standardize the evaluation of the \textit{Privacy-Utility Trade-off} in video-based action recognition. PrivHAR-Bench applies a graduated spectrum of visual privacy transformations: from lightweight spatial obfuscation to cryptographic block permutation, to a curated subset of 15 activity classes selected for human articulation diversity. Each of the 1,932 source videos is distributed across 9 parallel tiers of increasing privacy strength, with additional background-removed variants to isolate the contribution of human motion features from contextual scene bias. We provide lossless frame sequences, per-frame bounding boxes, estimated pose keypoints with joint-level confidence scores, standardized group-based train/test splits, and an evaluation toolkit computing recognition accuracy and privacy metrics. Empirical validation using R3D-18 demonstrates a measurable and interpretable degradation curve across tiers, with within-tier accuracy declining from 88.8\% (clear) to 53.5\% (encrypted, background-removed) and cross-domain accuracy collapsing to 4.8\%, establishing PrivHAR-Bench as a controlled benchmark for comparing privacy-preserving HAR methods under standardized conditions. The dataset, generation pipeline, and evaluation code are publicly available.

PrivHAR-Bench: A Graduated Privacy Benchmark Dataset for Video-Based Action Recognition

Abstract

Existing research on privacy-preserving Human Activity Recognition (HAR) typically evaluates methods against a binary paradigm: clear video versus a single privacy transformation. This limits cross-method comparability and obscures the nuanced relationship between privacy strength and recognition utility. We introduce \textit{PrivHAR-Bench}, a multi-tier benchmark dataset designed to standardize the evaluation of the \textit{Privacy-Utility Trade-off} in video-based action recognition. PrivHAR-Bench applies a graduated spectrum of visual privacy transformations: from lightweight spatial obfuscation to cryptographic block permutation, to a curated subset of 15 activity classes selected for human articulation diversity. Each of the 1,932 source videos is distributed across 9 parallel tiers of increasing privacy strength, with additional background-removed variants to isolate the contribution of human motion features from contextual scene bias. We provide lossless frame sequences, per-frame bounding boxes, estimated pose keypoints with joint-level confidence scores, standardized group-based train/test splits, and an evaluation toolkit computing recognition accuracy and privacy metrics. Empirical validation using R3D-18 demonstrates a measurable and interpretable degradation curve across tiers, with within-tier accuracy declining from 88.8\% (clear) to 53.5\% (encrypted, background-removed) and cross-domain accuracy collapsing to 4.8\%, establishing PrivHAR-Bench as a controlled benchmark for comparing privacy-preserving HAR methods under standardized conditions. The dataset, generation pipeline, and evaluation code are publicly available.

Paper Structure

This paper contains 64 sections, 6 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: A single frame from the PrivHAR-Bench dataset shown across selected privacy tiers. From left to right: Original (no privacy), Tier 1 Blur ($\sigma=15$), Tier 2 Edge (Canny), Tier 3 AES scramble at block sizes $B=4$, $B=8$, and $B=16$, and the background-removed (NoBG) variant of $B=8$. As tier level increases, spatial identity features are progressively destroyed while gross motion structure is preserved at varying degrees.
  • Figure 2: The PrivHAR-Bench privacy spectrum. Each tier progressively destroys a different category of visual information, from fine-grained appearance (Tier 1) through structural contour (Tier 2) to spatial pixel arrangement (Tier 3). Tier 3 is further parameterized by block size $B$.
  • Figure 3: Illustration of context bias control. Left: Tier 3 ($B=8$) with background preserved, a model could exploit environmental cues. Right: the NoBG variant of the same frame, isolating the scrambled human region.
  • Figure 4: Overview of the PrivHAR-Bench generation pipeline. Each source video passes through a shared ROI detection stage, after which all privacy tiers are generated in parallel using the same bounding box and mask.
  • Figure 5: Top-1 accuracy as a function of privacy tier for the R3D-18 baseline. Solid line: Config A (within-tier training). Dashed line: Config B (cross-domain, trained on Original only). The divergence between curves quantifies the domain gap introduced by each transformation. Note: The B8-NoBG condition combines two independent manipulations: cryptographic block scrambling (privacy) and background removal (context bias control). It is plotted at the rightmost position for visual continuity, but it is not strictly a point on the same linear privacy scale as the preceding tiers; it additionally removes an orthogonal confound (see Section 3.2 and Figure 3).
  • ...and 2 more figures