Table of Contents
Fetching ...

Two Deep Learning Solutions for Automatic Blurring of Faces in Videos

Roman Plaud, Jose-Luis Lisani

TL;DR

Two deep-learning based options are presented, consisting of a classical object detector (based on the YOLO architecture) trained to detect faces, which are subsequently blurred, and an indirect approach, in which a Unet-like segmentation network is trained to output a version of the input image in which all the faces have been blurred.

Abstract

The widespread use of cameras in everyday life situations generates a vast amount of data that may contain sensitive information about the people and vehicles moving in front of them (location, license plates, physical characteristics, etc). In particular, people's faces are recorded by surveillance cameras in public spaces. In order to ensure the privacy of individuals, face blurring techniques can be applied to the collected videos. In this paper we present two deep-learning based options to tackle the problem. First, a direct approach, consisting of a classical object detector (based on the YOLO architecture) trained to detect faces, which are subsequently blurred. Second, an indirect approach, in which a Unet-like segmentation network is trained to output a version of the input image in which all the faces have been blurred.

Two Deep Learning Solutions for Automatic Blurring of Faces in Videos

TL;DR

Two deep-learning based options are presented, consisting of a classical object detector (based on the YOLO architecture) trained to detect faces, which are subsequently blurred, and an indirect approach, in which a Unet-like segmentation network is trained to output a version of the input image in which all the faces have been blurred.

Abstract

The widespread use of cameras in everyday life situations generates a vast amount of data that may contain sensitive information about the people and vehicles moving in front of them (location, license plates, physical characteristics, etc). In particular, people's faces are recorded by surveillance cameras in public spaces. In order to ensure the privacy of individuals, face blurring techniques can be applied to the collected videos. In this paper we present two deep-learning based options to tackle the problem. First, a direct approach, consisting of a classical object detector (based on the YOLO architecture) trained to detect faces, which are subsequently blurred. Second, an indirect approach, in which a Unet-like segmentation network is trained to output a version of the input image in which all the faces have been blurred.
Paper Structure (23 sections, 12 figures, 7 tables)

This paper contains 23 sections, 12 figures, 7 tables.

Figures (12)

  • Figure 1: YOLOv5Face
  • Figure 2: YOLOv5 architecture
  • Figure 3: Yolo inference methodology
  • Figure 4: DeOldify architecture (from deoldifyIPOL). In red: pretrained ResNet, in blue: convolutional blocks, in green: upsample layers, in orange: self-attention layer, and in pink: sigmoid layer. The black lines stand for the skip connections
  • Figure 5: Correspondence inputs-targets for an image of FDDB dataset (up) and WIDER dataset (down).
  • ...and 7 more figures