High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention

André Peter Kelm; Niels Hannemann; Bruno Heberle; Lucas Schmidt; Tim Rolff; Christian Wilms; Ehsan Yaghoubi; Simone Frintrop

High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention

André Peter Kelm, Niels Hannemann, Bruno Heberle, Lucas Schmidt, Tim Rolff, Christian Wilms, Ehsan Yaghoubi, Simone Frintrop

TL;DR

A novel network topology that seamlessly integrates dynamic inference cost with a top-down attention mechanism, addressing two significant gaps in traditional deep learning models is introduced, and paves the way for future network designs that are lightweight and adaptable, making them suitable for a wide range of applications.

Abstract

This paper introduces a novel network topology that seamlessly integrates dynamic inference cost with a top-down attention mechanism, addressing two significant gaps in traditional deep learning models. Drawing inspiration from human perception, we combine sequential processing of generic low-level features with parallelism and nesting of high-level features. This design not only reflects a finding from recent neuroscience research regarding - spatially and contextually distinct neural activations - in human cortex, but also introduces a novel "cutout" technique: the ability to selectively activate %segments of the network for task-relevant only network segments of task-relevant categories to optimize inference cost and eliminate the need for re-training. We believe this paves the way for future network designs that are lightweight and adaptable, making them suitable for a wide range of applications, from compact edge devices to large-scale clouds. Our proposed topology also comes with a built-in top-down attention mechanism, which allows processing to be directly influenced by either enhancing or inhibiting category-specific high-level features, drawing parallels to the selective attention mechanism observed in human cognition. Using targeted external signals, we experimentally enhanced predictions across all tested models. In terms of dynamic inference cost our methodology can achieve an exclusion of up to $73.48\,\%$ of parameters and $84.41\,\%$ fewer giga-multiply-accumulate (GMAC) operations, analysis against comparative baselines show an average reduction of $40\,\%$ in parameters and $8\,\%$ in GMACs across the cases we evaluated.

High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention

TL;DR

Abstract

of parameters and

fewer giga-multiply-accumulate (GMAC) operations, analysis against comparative baselines show an average reduction of

in parameters and

in GMACs across the cases we evaluated.

Paper Structure (19 sections, 3 equations, 5 figures, 4 tables)

This paper contains 19 sections, 3 equations, 5 figures, 4 tables.

Introduction
Related Work
Image Classification and Dynamic Inference
Bottom-up vs. Top-Down Attention
Parallelism and Tree Networks
Proposed Method
High-Level Feature Parallelization
Split-Point:
Cross-Entropy Loss:
Nested Topology
NHL$_\mathrm{gpt}$:
Cutouts
Experiments
Training and Datasets
Dynamic Inference Cost
...and 4 more sections

Figures (5)

Figure 1: Neural network topologies (blue: tensors, black: connection path). \ref{['fig:1a']} conventional deep network from bottom to top (the dots represent five categories); \ref{['fig:2a']} Our proposed SeqPar structure: each category has its own branch; \ref{['fig:3a']} Our proposed compromise between a and b with a nested structure; \ref{['fig:4a']} Our innovative cutout technique.
Figure 2: Our PHL architecture starts sequentially, but becomes parallel after a split-point to generate category-specific features that are spatially separated from each other. The different colors represent the categories and their branches. The resolution and the dimension of the feature channel, multiplied by the number of layers have exemplary values for a better understanding.
Figure 3: Three-branched PHL network for dog, fish and cat categories with some example values (black: input dog, blue: input fish). The red boxes indicate the selection for 1 vs. all. Images from Imagenette imagenette.
Figure 4: Cutout from a N$3$HL network with $100$ categories. The red pathways highlight the activated paths for the selected $20$ categories from ImageNet100 imagenette, demonstrating the "cutout" technique (processing from bottom to top).
Figure 5: Category-specific features and original images from Imagenette imagenette. The top row displays the images at a reduced resolution. The second and third rows show $14 \times 14 \times3$ RGB-colored, category-specific high-level features extracted using PHL$_\mathrm{big}$ - conv 4. Specifically, the features in the second row are from the "dog" branch, while the features in the third row from the "fish" branch. The selected feature maps from the dog branch are salient for the dog object and less salient for the fish object. The opposite is observed for the fish branch.

High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention

TL;DR

Abstract

High-Level Parallelism and Nested Features for Dynamic Inference Cost and Top-Down Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (5)