Table of Contents
Fetching ...

Learning Continuous Face Representation with Explicit Functions

Liping Zhang, Weijun Li, Linjun Sun, Lina Yu, Xin Ning, Xiaoli Dong, Jian Xu, Hong Qin

TL;DR

Visualisation results show that, EmFace has a higher representation performance on faces with various expressions, postures, and other factors, and achieves reasonable performance on several face image processing tasks, including face image restoration, denoising, and transformation.

Abstract

How to represent a face pattern? While it is presented in a continuous way in our visual system, computers often store and process the face image in a discrete manner with 2D arrays of pixels. In this study, we attempt to learn a continuous representation for face images with explicit functions. First, we propose an explicit model (EmFace) for human face representation in the form of a finite sum of mathematical terms, where each term is an analytic function element. Further, to estimate the unknown parameters of EmFace, a novel neural network, EmNet, is designed with an encoder-decoder structure and trained using the backpropagation algorithm, where the encoder is defined by a deep convolutional neural network and the decoder is an explicit mathematical expression of EmFace. Experimental results show that EmFace has a higher representation performance on faces with various expressions, postures, and other factors, compared to that of other methods. Furthermore, EmFace achieves reasonable performance on several face image processing tasks, including face image restoration, denoising, and transformation.

Learning Continuous Face Representation with Explicit Functions

TL;DR

Visualisation results show that, EmFace has a higher representation performance on faces with various expressions, postures, and other factors, and achieves reasonable performance on several face image processing tasks, including face image restoration, denoising, and transformation.

Abstract

How to represent a face pattern? While it is presented in a continuous way in our visual system, computers often store and process the face image in a discrete manner with 2D arrays of pixels. In this study, we attempt to learn a continuous representation for face images with explicit functions. First, we propose an explicit model (EmFace) for human face representation in the form of a finite sum of mathematical terms, where each term is an analytic function element. Further, to estimate the unknown parameters of EmFace, a novel neural network, EmNet, is designed with an encoder-decoder structure and trained using the backpropagation algorithm, where the encoder is defined by a deep convolutional neural network and the decoder is an explicit mathematical expression of EmFace. Experimental results show that EmFace has a higher representation performance on faces with various expressions, postures, and other factors, compared to that of other methods. Furthermore, EmFace achieves reasonable performance on several face image processing tasks, including face image restoration, denoising, and transformation.

Paper Structure

This paper contains 24 sections, 17 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: The illustration of our method, which Takes a gray face image as a 2D surface, then introduces an explicit model (EmFace) with a fixed number of function elements to model the surface. EmNet is the parameters solving network of EmFace.
  • Figure 2: 2D surfaces for some face image samples.
  • Figure 3: Overview of our proposed EmNet for face image modeling. The input to our system is the surface of a cropped face. It starts with an encoder based on a CNN structure, and the output of the CNN is a vector with fixed dimensionality. Subsequently, this vector is interpreted as a set of function elements with parameters that define an explicit function. An MSE loss enforces the EmFace to reconstruct the input image as much as possible in the training phase.
  • Figure 4: Three-layer bottleneck block for ResNet50.
  • Figure 5: Training an EmNet for face image restoration and denoising. Input face image ${I}$ is stochastically corrupted (the circle represents the pixel value) to ${\tilde{I}}$. Subsequently, EmNet maps it to parameter ${\Theta}$ and attempts to reconstruct ${I}$ using EmFace. The reconstruction error between EmFace and ${I}$ was used to train EmNet.
  • ...and 7 more figures