Table of Contents
Fetching ...

Don't Think It Twice: Exploit Shift Invariance for Efficient Online Streaming Inference of CNNs

Christodoulos Kechris, Jonathan Dan, Jose Miranda, David Atienza

TL;DR

This work tackles the inefficiency of online streaming CNN inference when using overlapping time-series windows. It introduces StreamiNNC, a strategy to exploit convolutional shift invariance while addressing non-invariant padding and pooling through signal padding, pooling alignment, and targeted training, enabling near-equivalence to full-window inference with linear speedups. The authors derive theoretical pooling error bounds, demonstrate significant practical gains on three biomedical datasets, and show that with signal padding (or careful architectural alignment) streaming results remain within a small margin of error (NRMSE on the order of a few percent). The approach provides actionable guidelines for deploying streaming CNNs with minimal modifications to pretrained models, offering substantial real-time performance benefits in resource-constrained settings.

Abstract

Deep learning time-series processing often relies on convolutional neural networks with overlapping windows. This overlap allows the network to produce an output faster than the window length. However, it introduces additional computations. This work explores the potential to optimize computational efficiency during inference by exploiting convolution's shift-invariance properties to skip the calculation of layer activations between successive overlapping windows. Although convolutions are shift-invariant, zero-padding and pooling operations, widely used in such networks, are not efficient and complicate efficient streaming inference. We introduce StreamiNNC, a strategy to deploy Convolutional Neural Networks for online streaming inference. We explore the adverse effects of zero padding and pooling on the accuracy of streaming inference, deriving theoretical error upper bounds for pooling during streaming. We address these limitations by proposing signal padding and pooling alignment and provide guidelines for designing and deploying models for StreamiNNC. We validate our method in simulated data and on three real-world biomedical signal processing applications. StreamiNNC achieves a low deviation between streaming output and normal inference for all three networks (2.03 - 3.55% NRMSE). This work demonstrates that it is possible to linearly speed up the inference of streaming CNNs processing overlapping windows, negating the additional computation typically incurred by overlapping windows.

Don't Think It Twice: Exploit Shift Invariance for Efficient Online Streaming Inference of CNNs

TL;DR

This work tackles the inefficiency of online streaming CNN inference when using overlapping time-series windows. It introduces StreamiNNC, a strategy to exploit convolutional shift invariance while addressing non-invariant padding and pooling through signal padding, pooling alignment, and targeted training, enabling near-equivalence to full-window inference with linear speedups. The authors derive theoretical pooling error bounds, demonstrate significant practical gains on three biomedical datasets, and show that with signal padding (or careful architectural alignment) streaming results remain within a small margin of error (NRMSE on the order of a few percent). The approach provides actionable guidelines for deploying streaming CNNs with minimal modifications to pretrained models, offering substantial real-time performance benefits in resource-constrained settings.

Abstract

Deep learning time-series processing often relies on convolutional neural networks with overlapping windows. This overlap allows the network to produce an output faster than the window length. However, it introduces additional computations. This work explores the potential to optimize computational efficiency during inference by exploiting convolution's shift-invariance properties to skip the calculation of layer activations between successive overlapping windows. Although convolutions are shift-invariant, zero-padding and pooling operations, widely used in such networks, are not efficient and complicate efficient streaming inference. We introduce StreamiNNC, a strategy to deploy Convolutional Neural Networks for online streaming inference. We explore the adverse effects of zero padding and pooling on the accuracy of streaming inference, deriving theoretical error upper bounds for pooling during streaming. We address these limitations by proposing signal padding and pooling alignment and provide guidelines for designing and deploying models for StreamiNNC. We validate our method in simulated data and on three real-world biomedical signal processing applications. StreamiNNC achieves a low deviation between streaming output and normal inference for all three networks (2.03 - 3.55% NRMSE). This work demonstrates that it is possible to linearly speed up the inference of streaming CNNs processing overlapping windows, negating the additional computation typically incurred by overlapping windows.
Paper Structure (17 sections, 7 equations, 7 figures, 2 tables)

This paper contains 17 sections, 7 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Left: Full Inference. CNN, $f$, processing a window $\boldsymbol{x}_i$. Middle: Streaming Inference. Only the new information is processed by $f$, and part of the inputs and activations are stored and retrieved to be used as padding for the next window. All intermediate embeddings are stored in the aggregated embedding. If the network has been trained with Signal Padding, then the aggregated embedding is equivalent to the full inference embedding. Right: Approximate Streaming Inference. Just like in streaming inference, we only process the newest samples. Here, previous inputs/activations are not stored, and zero-padding is used instead as an approximation. The resulting intermediate embeddings are aggregated into an approximate embedding.
  • Figure 2: Illustration of shift-invariance of the pooling operation. Left: During the previous window, $i - 1$, the sequence $[1 \cdots 6]$ is processed. Then the window moves by a step of $S = 2$ samples, window $i$, processing samples $[3 \cdots 8]$ and similarly for the $i + 1$ window. The input is passed through Max Pooling with a pooling window size $L_p = 2$. $S$ and $L_p$ are aligned, hence $pool(\boldsymbol{x}_{i + 1})$ can be partially estimated from the elements of $pool(\boldsymbol{x}_{i})$ (blue arrows). Right:$S$ and $L_p$ are misaligned, and the pooling operation is not shiftable. Shifting the elements of $pool(x_i)$ to partially estimate $pool(x_{i + 1})$ can only be an approximation.
  • Figure 3: Strategy for training Signal Padding in batch mode. The input window, $\boldsymbol{x}_i$, is extended by $L_a$ samples. $L_a$ is chosen such that at depth $d$ the receptive field of $h$, $r_0$, is smaller than $L_a$. The feature extractor only processes the samples that correspond to the initial window $\boldsymbol{x}_i$.
  • Figure 4: Error introduced due to shifting on non-aligned pooling operations: empirical maximum expected error (blue), empirical maximum error (orange) and derived upper error bounds (green). For the mono-frequency input (top), our bound aligns with the empirical maximum errors. For the multi-frequency input (bottom), the actual empirical error is less than our derived bounds. Nonetheless, our model predicts the behavior of the shift approximation as a function of the pooling window and the sampling frequency.
  • Figure 5: Effect of zero-padding on the convolution activations. Left: Activations of intermediate convolution layers from $h_{PPG}$ with constant inputs at 1 and moving average convolutional weights. The first layers, e.g. first three, show little zero-padding effect, with the majority of the output at 1, in contrast to deeper layers where all points are affected. Right: Percentage of activation points which are less than 1, indicating an effect of the zero-padding for $h_{PPG}$ (blue), $h_{EEG}$ (orange), and $h_{ACC}$ (green).
  • ...and 2 more figures