Table of Contents
Fetching ...

Keeping it Local, Tiny and Real: Automated Report Generation on Edge Computing Devices for Mechatronic-Based Cognitive Systems

Nicolas Schuler, Lea Dewald, Jürgen Graf

TL;DR

The paper tackles the challenge of evaluating mobile mechatronic cognitive systems while preserving privacy by proposing an edge-based automated reporting pipeline that operates entirely on local models. It generates natural-language reports from multi-modal sensor data, clusters scene descriptions by semantic similarity, anonymizes outputs, and localizes findings within world coordinates to produce reports in multiple formats (Markdown, HTML, LaTeX). Evaluated on a Trier dataset across Campus Indoor, Campus Outdoor, and City domains, the approach yields semantically meaningful clusters with quantitative metrics (e.g., ARI, NMI, FMI) and provides qualitative example reports, demonstrating on-device capabilities without cloud reliance. This work contributes a privacy-preserving, domain-agnostic framework for automated reporting in mobile robotics and makes its code, data, and reports publicly available, enabling broader adoption in edge-computing environments and real-world deployments.

Abstract

Recent advancements in Deep Learning enable hardware-based cognitive systems, that is, mechatronic systems in general and robotics in particular with integrated Artificial Intelligence, to interact with dynamic and unstructured environments. While the results are impressive, the application of such systems to critical tasks like autonomous driving as well as service and care robotics necessitate the evaluation of large amount of heterogeneous data. Automated report generation for Mobile Robotics can play a crucial role in facilitating the evaluation and acceptance of such systems in various domains. In this paper, we propose a pipeline for generating automated reports in natural language utilizing various multi-modal sensors that solely relies on local models capable of being deployed on edge computing devices, thus preserving the privacy of all actors involved and eliminating the need for external services. In particular, we evaluate our implementation on a diverse dataset spanning multiple domains including indoor, outdoor and urban environments, providing quantitative as well as qualitative evaluation results. Various generated example reports and other supplementary materials are available via a public repository.

Keeping it Local, Tiny and Real: Automated Report Generation on Edge Computing Devices for Mechatronic-Based Cognitive Systems

TL;DR

The paper tackles the challenge of evaluating mobile mechatronic cognitive systems while preserving privacy by proposing an edge-based automated reporting pipeline that operates entirely on local models. It generates natural-language reports from multi-modal sensor data, clusters scene descriptions by semantic similarity, anonymizes outputs, and localizes findings within world coordinates to produce reports in multiple formats (Markdown, HTML, LaTeX). Evaluated on a Trier dataset across Campus Indoor, Campus Outdoor, and City domains, the approach yields semantically meaningful clusters with quantitative metrics (e.g., ARI, NMI, FMI) and provides qualitative example reports, demonstrating on-device capabilities without cloud reliance. This work contributes a privacy-preserving, domain-agnostic framework for automated reporting in mobile robotics and makes its code, data, and reports publicly available, enabling broader adoption in edge-computing environments and real-world deployments.

Abstract

Recent advancements in Deep Learning enable hardware-based cognitive systems, that is, mechatronic systems in general and robotics in particular with integrated Artificial Intelligence, to interact with dynamic and unstructured environments. While the results are impressive, the application of such systems to critical tasks like autonomous driving as well as service and care robotics necessitate the evaluation of large amount of heterogeneous data. Automated report generation for Mobile Robotics can play a crucial role in facilitating the evaluation and acceptance of such systems in various domains. In this paper, we propose a pipeline for generating automated reports in natural language utilizing various multi-modal sensors that solely relies on local models capable of being deployed on edge computing devices, thus preserving the privacy of all actors involved and eliminating the need for external services. In particular, we evaluate our implementation on a diverse dataset spanning multiple domains including indoor, outdoor and urban environments, providing quantitative as well as qualitative evaluation results. Various generated example reports and other supplementary materials are available via a public repository.

Paper Structure

This paper contains 13 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: The different mobile cognitive systems used within our laboratory, utilizing the same sensory system (top left) used for data acquisition in the present paper. Description and images modified from Schuler.
  • Figure 2: The proposed pipeline for automated report generation for Mobile Robotics. We utilize a combination of VLM, text and image embedders like CLIPRadford2021, NLP and zero-shot detection and segmentation models to generate reports of multi-modal data recorded by a variety of mobile cognitive systems, agnostic to the application domain.
  • Figure 3: Examples of an automatically generated report from a car drive through the German city of Trier. a) generated description, b) original image and segmentation results, c) location of individual descriptions colored by cluster, d) relative distribution of clusters and e) timeline of individual descriptions colored by cluster. Map modified from OpenStreetMapOSMF.
  • Figure 4: Examples from the dataset used in the evaluation of this paper. The dataset contains various domains, challenging lighting conditions, crowded scenes and complex scenarios. The dataset was captured by different mobile cognitive systems within our laboratory. Images taken from Schuler.