iNNspector: Visual, Interactive Deep Model Debugging
Thilo Spinner, Daniel Fürst, Mennatallah El-Assady
TL;DR
The paper tackles the gap in practical, systematic debugging of deep learning experiments by proposing a conceptual data-space framework and the iNNspector visual analytics system. It defines a comprehensive data space (structural, scalar, high-dimensional, and functions) and six design dimensions to instantiate debugging components, then couples this with global mechanisms to navigate model architectures across multiple abstraction levels. iNNspector provides end-to-end tooling—custom Keras logs, a FastAPI backend, and a React frontend—that support model tracking, multi-model comparisons, and detailed inspections of weights, activations, and outputs through widgets and a toolbox. The authors validate the approach with three real-world use cases and an expert user study, demonstrating improved data access, actionable debugging insights, and strong usability, while acknowledging limitations in data volume, complexity, and tool completeness. Overall, the work offers a reusable, extensible framework and a working system to integrate systematic debugging into everyday DL workflows, enabling more reliable diagnosis, verification, and refinement of neural networks.
Abstract
Deep learning model design, development, and debugging is a process driven by best practices, guidelines, trial-and-error, and the personal experiences of model developers. At multiple stages of this process, performance and internal model data can be logged and made available. However, due to the sheer complexity and scale of this data and process, model developers often resort to evaluating their model performance based on abstract metrics like accuracy and loss. We argue that a structured analysis of data along the model's architecture and at multiple abstraction levels can considerably streamline the debugging process. Such a systematic analysis can further connect the developer's design choices to their impacts on the model behavior, facilitating the understanding, diagnosis, and refinement of deep learning models. Hence, in this paper, we (1) contribute a conceptual framework structuring the data space of deep learning experiments. Our framework, grounded in literature analysis and requirements interviews, captures design dimensions and proposes mechanisms to make this data explorable and tractable. To operationalize our framework in a ready-to-use application, we (2) present the iNNspector system. iNNspector enables tracking of deep learning experiments and provides interactive visualizations of the data on all levels of abstraction from multiple models to individual neurons. Finally, we (3) evaluate our approach with three real-world use-cases and a user study with deep learning developers and data analysts, proving its effectiveness and usability.
