Table of Contents
Fetching ...

Fast Deep Predictive Coding Networks for Videos Feature Extraction without Labels

Wenqian Xue, Chi Ding, Jose Principe

TL;DR

This paper proposes a DPCN with a fast inference of internal model variables (states and causes) that achieves high sparsity and accuracy of feature clustering, and outperforms previous versions of DPCNs on learning rate, sparsity ratio, and feature clustering accuracy.

Abstract

Brain-inspired deep predictive coding networks (DPCNs) effectively model and capture video features through a bi-directional information flow, even without labels. They are based on an overcomplete description of video scenes, and one of the bottlenecks has been the lack of effective sparsification techniques to find discriminative and robust dictionaries. FISTA has been the best alternative. This paper proposes a DPCN with a fast inference of internal model variables (states and causes) that achieves high sparsity and accuracy of feature clustering. The proposed unsupervised learning procedure, inspired by adaptive dynamic programming with a majorization-minimization framework, and its convergence are rigorously analyzed. Experiments in the data sets CIFAR-10, Super Mario Bros video game, and Coil-100 validate the approach, which outperforms previous versions of DPCNs on learning rate, sparsity ratio, and feature clustering accuracy. Because of DCPN's solid foundation and explainability, this advance opens the door for general applications in object recognition in video without labels.

Fast Deep Predictive Coding Networks for Videos Feature Extraction without Labels

TL;DR

This paper proposes a DPCN with a fast inference of internal model variables (states and causes) that achieves high sparsity and accuracy of feature clustering, and outperforms previous versions of DPCNs on learning rate, sparsity ratio, and feature clustering accuracy.

Abstract

Brain-inspired deep predictive coding networks (DPCNs) effectively model and capture video features through a bi-directional information flow, even without labels. They are based on an overcomplete description of video scenes, and one of the bottlenecks has been the lack of effective sparsification techniques to find discriminative and robust dictionaries. FISTA has been the best alternative. This paper proposes a DPCN with a fast inference of internal model variables (states and causes) that achieves high sparsity and accuracy of feature clustering. The proposed unsupervised learning procedure, inspired by adaptive dynamic programming with a majorization-minimization framework, and its convergence are rigorously analyzed. Experiments in the data sets CIFAR-10, Super Mario Bros video game, and Coil-100 validate the approach, which outperforms previous versions of DPCNs on learning rate, sparsity ratio, and feature clustering accuracy. Because of DCPN's solid foundation and explainability, this advance opens the door for general applications in object recognition in video without labels.
Paper Structure (23 sections, 4 theorems, 56 equations, 7 figures, 4 tables, 4 algorithms)

This paper contains 23 sections, 4 theorems, 56 equations, 7 figures, 4 tables, 4 algorithms.

Key Result

Theorem 1

Consider the sequence $\{x_t^{i}\}\in \mathbb{R}^K$ for a patch generated by Algorithm al2. Then, $F(x_t^i)$ converges, and for any $s\geq 1$ we have where $R=\text{diag} \{1/(\tilde{{1}}|(x_t^{0})_k|+(1-\tilde{{1}}-\bar{{1}})|(x_t^{*})_k|+\bar{{1}}|(x_t^{i})_k|)\}$, $k\in \{1,2,..., K\}$, with $\tilde{{1}}=1$ if $|(x^*)_k| \geq |(x_t^{0})_k|>0$, $\tilde{{1}}=0$ if $0 \leq |(x^*)_k| < |(x_t^{0})_

Figures (7)

  • Figure 1: Two-layered DPCNs structure. The video frame is decomposed into patches (green blocks). Every patch is mapped onto a state $x_t^1$ at layer 1, and the cause $u_t^1$ pool all the states within a group. The cause $u_t^1$ is input of layer 2 and corresponds to state $x_t^2$ and cause $u_t^2$.
  • Figure 2: (a) Bi-directional inference flow, where feedforward (yellow), feedback (green), and recurrent (pink) connections convey the bottom-up and top-down predictions. (b) Connections for variables inference (solid lines) and for model inference (dash lines).
  • Figure 3: (a) Convergence of MM Algorithm \ref{['al2']}, ISTA, FISTA, and ADAM, (b) sparsity level using MM Algorithm \ref{['al2']}, and (c) sparsity level using FISTA.
  • Figure 4: Clustering result for a Super Mario Bros video data set.
  • Figure 5: Qualitative video sequence reconstruction for Super Mario Bros and Coil-100 data sets.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 1