Table of Contents
Fetching ...

Towards smart and adaptive agents for active sensing on edge devices

Devendra Vyas, Nikola Pižurica, Nikola Milović, Igor Jovančević, Miguel de Prado, Tim Verbelen

TL;DR

This work tackles the challenge of real-time, adaptive sensing on resource-constrained edge devices by integrating a deep learning perception module with an active inference planning module. The approach employs a compact two-module architecture, using YOLOv10n for edge perception and a discrete, probabilistic planner that minimizes the free energy $F$ and expected free energy $G(\\pi)$ to drive camera actions. Its key contributions include a memory-efficient saccade agent (~2.3 million parameters) with a sub-gigabyte footprint that operates entirely on-device on an NVIDIA Jetson platform, and a flexible edge deployment workflow (ONNX/ORT/TFLite/TensorRT) enabling real-time performance across hardware configurations. The results demonstrate real-time perception and planning with low latency and variable memory usage, validating the viability of adaptive, on-device active sensing for surveillance and robotics without reliance on cloud resources. This work advances edge AI by combining probabilistic planning based on the Free Energy principle with lightweight deep perception to handle dynamic environments and uncertainty in a resource-constrained setting.

Abstract

TinyML has made deploying deep learning models on low-power edge devices feasible, creating new opportunities for real-time perception in constrained environments. However, the adaptability of such deep learning methods remains limited to data drift adaptation, lacking broader capabilities that account for the environment's underlying dynamics and inherent uncertainty. Deep learning's scaling laws, which counterbalance this limitation by massively up-scaling data and model size, cannot be applied when deploying on the Edge, where deep learning limitations are further amplified as models are scaled down for deployment on resource-constrained devices. This paper presents an innovative agentic system capable of performing on-device perception and planning, enabling active sensing on the edge. By incorporating active inference into our solution, our approach extends beyond deep learning capabilities, allowing the system to plan in dynamic environments while operating in real-time with a compact memory footprint of as little as 300 MB. We showcase our proposed system by creating and deploying a saccade agent connected to an IoT camera with pan and tilt capabilities on an NVIDIA Jetson embedded device. The saccade agent controls the camera's field of view following optimal policies derived from the active inference principles, simulating human-like saccadic motion for surveillance and robotics applications.

Towards smart and adaptive agents for active sensing on edge devices

TL;DR

This work tackles the challenge of real-time, adaptive sensing on resource-constrained edge devices by integrating a deep learning perception module with an active inference planning module. The approach employs a compact two-module architecture, using YOLOv10n for edge perception and a discrete, probabilistic planner that minimizes the free energy and expected free energy to drive camera actions. Its key contributions include a memory-efficient saccade agent (~2.3 million parameters) with a sub-gigabyte footprint that operates entirely on-device on an NVIDIA Jetson platform, and a flexible edge deployment workflow (ONNX/ORT/TFLite/TensorRT) enabling real-time performance across hardware configurations. The results demonstrate real-time perception and planning with low latency and variable memory usage, validating the viability of adaptive, on-device active sensing for surveillance and robotics without reliance on cloud resources. This work advances edge AI by combining probabilistic planning based on the Free Energy principle with lightweight deep perception to handle dynamic environments and uncertainty in a resource-constrained setting.

Abstract

TinyML has made deploying deep learning models on low-power edge devices feasible, creating new opportunities for real-time perception in constrained environments. However, the adaptability of such deep learning methods remains limited to data drift adaptation, lacking broader capabilities that account for the environment's underlying dynamics and inherent uncertainty. Deep learning's scaling laws, which counterbalance this limitation by massively up-scaling data and model size, cannot be applied when deploying on the Edge, where deep learning limitations are further amplified as models are scaled down for deployment on resource-constrained devices. This paper presents an innovative agentic system capable of performing on-device perception and planning, enabling active sensing on the edge. By incorporating active inference into our solution, our approach extends beyond deep learning capabilities, allowing the system to plan in dynamic environments while operating in real-time with a compact memory footprint of as little as 300 MB. We showcase our proposed system by creating and deploying a saccade agent connected to an IoT camera with pan and tilt capabilities on an NVIDIA Jetson embedded device. The saccade agent controls the camera's field of view following optimal policies derived from the active inference principles, simulating human-like saccadic motion for surveillance and robotics applications.
Paper Structure (19 sections, 3 equations, 7 figures, 1 table)

This paper contains 19 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Conceptual Framework for Smart Edge Agents, composed of a deep-learning perception module and an active inference planning module for active (visual) sensing. The camera frames are processed by the object detector, which forwards the detected results to the active inference module. Our agent plans its next action, minimizing free energy, and dynamically adapting to the environment.
  • Figure 2: Action space: We discretize the action space into $K \times L$ fixation points. Given a fixation point, the field of view of the camera spans $W \times H$ blocks (in blue). Object detections are translated into discrete bins (in red).
  • Figure 3: Applications: Our agentic system enables active sensing solutions for edge robotics and surveillance IoT cameras. On the left, the Tapo IoT camera tapo used for surveillance applications. On the right, the Locobot robot WX250 locobot for information gathering and scene discovery.
  • Figure 4: Perception module performance: Optimization achieved by exporting the YOLOv10n Torch model from Ultralytics to ONNX and compiling it with different deployment inference engines on the Nvidia Jetson Orin NX's CPU and GPU.
  • Figure 5: Perception module memory profile of the YoloV10n when executed with different deployment inference engines on the Nvidia Jetson Orin NX's CPU and GPU.
  • ...and 2 more figures