Table of Contents
Fetching ...

iNNspector: Visual, Interactive Deep Model Debugging

Thilo Spinner, Daniel Fürst, Mennatallah El-Assady

TL;DR

The paper tackles the gap in practical, systematic debugging of deep learning experiments by proposing a conceptual data-space framework and the iNNspector visual analytics system. It defines a comprehensive data space (structural, scalar, high-dimensional, and functions) and six design dimensions to instantiate debugging components, then couples this with global mechanisms to navigate model architectures across multiple abstraction levels. iNNspector provides end-to-end tooling—custom Keras logs, a FastAPI backend, and a React frontend—that support model tracking, multi-model comparisons, and detailed inspections of weights, activations, and outputs through widgets and a toolbox. The authors validate the approach with three real-world use cases and an expert user study, demonstrating improved data access, actionable debugging insights, and strong usability, while acknowledging limitations in data volume, complexity, and tool completeness. Overall, the work offers a reusable, extensible framework and a working system to integrate systematic debugging into everyday DL workflows, enabling more reliable diagnosis, verification, and refinement of neural networks.

Abstract

Deep learning model design, development, and debugging is a process driven by best practices, guidelines, trial-and-error, and the personal experiences of model developers. At multiple stages of this process, performance and internal model data can be logged and made available. However, due to the sheer complexity and scale of this data and process, model developers often resort to evaluating their model performance based on abstract metrics like accuracy and loss. We argue that a structured analysis of data along the model's architecture and at multiple abstraction levels can considerably streamline the debugging process. Such a systematic analysis can further connect the developer's design choices to their impacts on the model behavior, facilitating the understanding, diagnosis, and refinement of deep learning models. Hence, in this paper, we (1) contribute a conceptual framework structuring the data space of deep learning experiments. Our framework, grounded in literature analysis and requirements interviews, captures design dimensions and proposes mechanisms to make this data explorable and tractable. To operationalize our framework in a ready-to-use application, we (2) present the iNNspector system. iNNspector enables tracking of deep learning experiments and provides interactive visualizations of the data on all levels of abstraction from multiple models to individual neurons. Finally, we (3) evaluate our approach with three real-world use-cases and a user study with deep learning developers and data analysts, proving its effectiveness and usability.

iNNspector: Visual, Interactive Deep Model Debugging

TL;DR

The paper tackles the gap in practical, systematic debugging of deep learning experiments by proposing a conceptual data-space framework and the iNNspector visual analytics system. It defines a comprehensive data space (structural, scalar, high-dimensional, and functions) and six design dimensions to instantiate debugging components, then couples this with global mechanisms to navigate model architectures across multiple abstraction levels. iNNspector provides end-to-end tooling—custom Keras logs, a FastAPI backend, and a React frontend—that support model tracking, multi-model comparisons, and detailed inspections of weights, activations, and outputs through widgets and a toolbox. The authors validate the approach with three real-world use cases and an expert user study, demonstrating improved data access, actionable debugging insights, and strong usability, while acknowledging limitations in data volume, complexity, and tool completeness. Overall, the work offers a reusable, extensible framework and a working system to integrate systematic debugging into everyday DL workflows, enabling more reliable diagnosis, verification, and refinement of neural networks.

Abstract

Deep learning model design, development, and debugging is a process driven by best practices, guidelines, trial-and-error, and the personal experiences of model developers. At multiple stages of this process, performance and internal model data can be logged and made available. However, due to the sheer complexity and scale of this data and process, model developers often resort to evaluating their model performance based on abstract metrics like accuracy and loss. We argue that a structured analysis of data along the model's architecture and at multiple abstraction levels can considerably streamline the debugging process. Such a systematic analysis can further connect the developer's design choices to their impacts on the model behavior, facilitating the understanding, diagnosis, and refinement of deep learning models. Hence, in this paper, we (1) contribute a conceptual framework structuring the data space of deep learning experiments. Our framework, grounded in literature analysis and requirements interviews, captures design dimensions and proposes mechanisms to make this data explorable and tractable. To operationalize our framework in a ready-to-use application, we (2) present the iNNspector system. iNNspector enables tracking of deep learning experiments and provides interactive visualizations of the data on all levels of abstraction from multiple models to individual neurons. Finally, we (3) evaluate our approach with three real-world use-cases and a user study with deep learning developers and data analysts, proving its effectiveness and usability.
Paper Structure (41 sections, 18 figures, 1 table)

This paper contains 41 sections, 18 figures, 1 table.

Figures (18)

  • Figure 1: The workflow we follow to substantiate, design, implement, and evaluate the iNNspector system for systematic model debugging, which also reflects in the structure of this paper.
  • Figure 2: An exemplary instantiation of a debugging component. The user selects characteristics for preferred dimensions, possibly constraining other dimensions. Eventually, all dimensions are determined by iterating this process, and the component can be created.
  • Figure 4: The iNNspector frontend. It is built around the inspection panel (a), showing nodes (a1) and links (a2) of the structural backbone on the current level of abstraction. The minimap (b) helps to navigate the viewport. Tools (c1) from the Toolbox (c) can be applied to units of analysis in the structural backbone to create widgets (d1), showing underlying data. Widgets are arranged in the widget panel (d), where they are organized according to their level of abstraction. Semantically related widgets can be combined into groups (d2). Widgets showing class-dependant data can be constrained to certain classes using the global class selector (e). The localization and interestingess panel (f) provides tools to identify units of interest.
  • Figure 5: Inspection panel on L3, 'Multi-model'. Experiment (\ref{['inn:fig:innspector-l3-groups-a']}) has three models and experiment (\ref{['inn:fig:innspector-l3-groups-b']}) two, each model represented by a unique color. Edges denote parent-child relationships, with tooltips and edge width indicating changes between model pairs.
  • Figure 6: Inspection panel on L2, "Single Model". The view shows the architecture of a model, with each box representing a layer and the edges denoting the data flow between layers. Elements inside the layer boxes represent operations applied to the layer input.
  • ...and 13 more figures