Table of Contents
Fetching ...

A Survey on Dynamic Neural Networks: from Computer Vision to Multi-modal Sensor Fusion

Fabio Montello, Ronja Güldenring, Simone Scardapane, Lazaros Nalpantidis

TL;DR

This survey addresses the problem of static model optimization by focusing on Dynamic Neural Networks in Computer Vision, where computation adapts to input complexity. It presents a threefold taxonomy—Early Exits, Dynamic Routing, and Token Skimming—grouping methods by where the network exhibits dynamicity and highlighting Sensor Fusion as a promising application area. The authors synthesize 161 CV papers (spanning 2016–2025) and provide a curated repository to facilitate comparison and replication, while detailing learning strategies, exit policies, and optimizer concerns. They also discuss Sensor Fusion implications, arguing that adaptive computation can improve efficiency, robustness to noise, and information prioritization in multi-sensor settings. The work concludes with challenges, future directions, and a roadmap for expanding dynamic techniques across architectures and modalities, aiming to broaden practical adoption.

Abstract

Model compression is essential in the deployment of large Computer Vision models on embedded devices. However, static optimization techniques (e.g. pruning, quantization, etc.) neglect the fact that different inputs have different complexities, thus requiring different amount of computations. Dynamic Neural Networks allow to condition the number of computations to the specific input. The current literature on the topic is very extensive and fragmented. We present a comprehensive survey that synthesizes and unifies existing Dynamic Neural Networks research in the context of Computer Vision. Additionally, we provide a logical taxonomy based on which component of the network is adaptive: the output, the computation graph or the input. Furthermore, we argue that Dynamic Neural Networks are particularly beneficial in the context of Sensor Fusion for better adaptivity, noise reduction and information prioritization. We present preliminary works in this direction. We complement this survey with a curated repository listing all the surveyed papers, each with a brief summary of the solution and the code base when available: https://github.com/DTU-PAS/awesome-dynn-for-cv .

A Survey on Dynamic Neural Networks: from Computer Vision to Multi-modal Sensor Fusion

TL;DR

This survey addresses the problem of static model optimization by focusing on Dynamic Neural Networks in Computer Vision, where computation adapts to input complexity. It presents a threefold taxonomy—Early Exits, Dynamic Routing, and Token Skimming—grouping methods by where the network exhibits dynamicity and highlighting Sensor Fusion as a promising application area. The authors synthesize 161 CV papers (spanning 2016–2025) and provide a curated repository to facilitate comparison and replication, while detailing learning strategies, exit policies, and optimizer concerns. They also discuss Sensor Fusion implications, arguing that adaptive computation can improve efficiency, robustness to noise, and information prioritization in multi-sensor settings. The work concludes with challenges, future directions, and a roadmap for expanding dynamic techniques across architectures and modalities, aiming to broaden practical adoption.

Abstract

Model compression is essential in the deployment of large Computer Vision models on embedded devices. However, static optimization techniques (e.g. pruning, quantization, etc.) neglect the fact that different inputs have different complexities, thus requiring different amount of computations. Dynamic Neural Networks allow to condition the number of computations to the specific input. The current literature on the topic is very extensive and fragmented. We present a comprehensive survey that synthesizes and unifies existing Dynamic Neural Networks research in the context of Computer Vision. Additionally, we provide a logical taxonomy based on which component of the network is adaptive: the output, the computation graph or the input. Furthermore, we argue that Dynamic Neural Networks are particularly beneficial in the context of Sensor Fusion for better adaptivity, noise reduction and information prioritization. We present preliminary works in this direction. We complement this survey with a curated repository listing all the surveyed papers, each with a brief summary of the solution and the code base when available: https://github.com/DTU-PAS/awesome-dynn-for-cv .
Paper Structure (38 sections, 5 equations, 4 figures, 9 tables)

This paper contains 38 sections, 5 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: The three types of Dynamic Neural Networks we consider in the survey. From left to right: (a) Early Exits networks decide at which point to output, (b) Dynamic routing networks use a Mixture-of-Experts and decide which computational path is optimal according to the input, (c) Token skimming networks decide which subset of tokens will attend the following blocks.
  • Figure 2: Overview of the publications considered in this survey, grouped by year and topic. In total, 148 publications have been reviewed: 62 for the Early Exits Section, 44 for the Dynamic Routing Section, 27 for the Token Skimming Section, and 15 in the Dynamic Sensor Fusion Section.
  • Figure 3: Taxonomy of Dynamic Neural Network techniques presented in this survey, categorized by application domain (Computer Vision and Sensor Fusion) and specific method. The diagram highlights key methods such as Early Exits (in green), Computational Routing (in blue), Token Skimming (in red), and their applications in various Sensor Fusion tasks (in yellow).
  • Figure 4: Illustrative example of MSDNet (Multi-Scale Dense Network). MSDNet processes the input image through a multi-scale architecture, enabling feature extraction at various resolutions. The feature maps shown in blue and classifiers highlighted in orange are active during this example, while the grey blocks indicate components that are not in use. For the complete scheme, please refer to huangMultiScaleDenseNetworks2018.