Table of Contents
Fetching ...

Central limit theorems for the outputs of fully convolutional neural networks with time series input

Annika Betken, Giorgio Micali, Johannes Schmidt-Hieber

Abstract

Deep learning is widely deployed for time series learning tasks such as classification and forecasting. Despite the empirical successes, only little theory has been developed so far in the time series context. In this work, we prove that if the network inputs are generated from short-range dependent linear processes, the outputs of fully convolutional neural networks (FCNs) with global average pooling (GAP) are asymptotically Gaussian and the limit is attained if the length of the observed time series tends to infinity. The proof leverages existing tools from the theoretical time series literature. Based on our theory, we propose a generalization of the GAP layer by considering a global weighted pooling step with slowly varying, learnable coefficients.

Central limit theorems for the outputs of fully convolutional neural networks with time series input

Abstract

Deep learning is widely deployed for time series learning tasks such as classification and forecasting. Despite the empirical successes, only little theory has been developed so far in the time series context. In this work, we prove that if the network inputs are generated from short-range dependent linear processes, the outputs of fully convolutional neural networks (FCNs) with global average pooling (GAP) are asymptotically Gaussian and the limit is attained if the length of the observed time series tends to infinity. The proof leverages existing tools from the theoretical time series literature. Based on our theory, we propose a generalization of the GAP layer by considering a global weighted pooling step with slowly varying, learnable coefficients.

Paper Structure

This paper contains 17 sections, 11 theorems, 202 equations, 5 figures, 1 table.

Key Result

Theorem 1

Let $(\mathbf X_t)_{t\ge 1}$ be a linear process satisfying Assumption assum1:M1, and let $f_1,\ldots,f_M:\mathbb{R}^k\to\mathbb{R}$ be Lipschitz continuous functions. Let $(a_{t,m}^{(n)})_{t=1,\dots,n;\,m=1,\dots,M}$ be deterministic weights such that $a_{t,\cdot}^{(n)}=0$ whenever $t\notin \{1,\ld and If then, where $\Sigma=(\Sigma_{m\ell})_{m,\ell=1}^M$ is the covariance matrix with entries

Figures (5)

  • Figure 1: FCN architecture with Residual connection.
  • Figure 2: Empirical autocorrelation matrices $\widehat{\mathcal{R}}^{\mathrm{GAP}}(\theta)$ for the one-layer GAP network with $m_1=10$ under fixed Gaussian initialization and input generated from an AR$(1)$ process with parameters $\theta=0$ (left panel) and $\theta=0.99$ (right panel). The numbering of the output neurons has been chosen to maximize the correlation of successive neurons, thereby revealing the strong clustering of the outputs.
  • Figure 3: Empirical correlation matrices with dependence parameter $\theta=0.65$ in the Gaussian AR(1) input time series. The numbering of the output neurons has been chosen to maximize the correlation of successive neurons.
  • Figure 4: $n=1000$, $B=4$, MA(1) input
  • Figure 5: $n=1000$, $B=4$, AR(1)

Theorems & Definitions (20)

  • Theorem 1
  • Definition 1: FCNs with residual connections, $L$ layers, filter widths $k_1,\ldots, k_L,$ and layer widths $m_1,\ldots, m_L$
  • Lemma 2
  • Theorem 3
  • Theorem 4: Theorem 4.2 in billingsley:1968, p. 25
  • Theorem 5: Theorem 1 in Pollard1984, p. 171
  • Lemma 6: Adapted Lemma 3.4 of Furmanczyk
  • proof
  • Lemma 7
  • proof
  • ...and 10 more