Two Deep Learning Solutions for Automatic Blurring of Faces in Videos

Roman Plaud; Jose-Luis Lisani

Two Deep Learning Solutions for Automatic Blurring of Faces in Videos

Roman Plaud, Jose-Luis Lisani

TL;DR

Two deep-learning based options are presented, consisting of a classical object detector (based on the YOLO architecture) trained to detect faces, which are subsequently blurred, and an indirect approach, in which a Unet-like segmentation network is trained to output a version of the input image in which all the faces have been blurred.

Abstract

The widespread use of cameras in everyday life situations generates a vast amount of data that may contain sensitive information about the people and vehicles moving in front of them (location, license plates, physical characteristics, etc). In particular, people's faces are recorded by surveillance cameras in public spaces. In order to ensure the privacy of individuals, face blurring techniques can be applied to the collected videos. In this paper we present two deep-learning based options to tackle the problem. First, a direct approach, consisting of a classical object detector (based on the YOLO architecture) trained to detect faces, which are subsequently blurred. Second, an indirect approach, in which a Unet-like segmentation network is trained to output a version of the input image in which all the faces have been blurred.

Two Deep Learning Solutions for Automatic Blurring of Faces in Videos

TL;DR

Abstract

Paper Structure (23 sections, 12 figures, 7 tables)

This paper contains 23 sections, 12 figures, 7 tables.

Introduction
Face Blurring using YOLO
Architecture
Dataset and Training
Inference methodology
Face Blurring using DeOldify
Architecture
Dataset for training
Datasets overview
Construction methodology
Training
Loss function
Training Procedure
Inference methodology
Experiments
...and 8 more sections

Figures (12)

Figure 1: YOLOv5Face
Figure 2: YOLOv5 architecture
Figure 3: Yolo inference methodology
Figure 4: DeOldify architecture (from deoldifyIPOL). In red: pretrained ResNet, in blue: convolutional blocks, in green: upsample layers, in orange: self-attention layer, and in pink: sigmoid layer. The black lines stand for the skip connections
Figure 5: Correspondence inputs-targets for an image of FDDB dataset (up) and WIDER dataset (down).
...and 7 more figures

Two Deep Learning Solutions for Automatic Blurring of Faces in Videos

TL;DR

Abstract

Two Deep Learning Solutions for Automatic Blurring of Faces in Videos

Authors

TL;DR

Abstract

Table of Contents

Figures (12)