Table of Contents
Fetching ...

DTAA: A Detect, Track and Avoid Architecture for navigation in spaces with Multiple Velocity Objects

Samuel Nordström, Björn Lindquist, George Nikolakopoulos

TL;DR

The paper tackles safe, autonomous navigation in spaces with multiple moving objects by proposing the Detect-Track-Avoid Architecture (DTAA). DTAA fuses real-time perception (YOLOv8), embedded tracking (Ultralytics), Kalman-based state estimation, and ellipse-based unsafe spaces with a nonlinear model predictive controller (NMPC) guided by a D$^{+}_{+}$ path planner and an APF-adjusted reference trajectory. Its key contributions include multi-obstacle clustering into minimum spanning ellipses, prioritization for camera tracking, and real-time NMPC handling of multiple velocity obstacles under a horizon $N$ (e.g., $N=40$) with a safety margin $s$. The system is validated on Boston Dynamics Spot across lab, corridor, outdoor, and subterranean settings, demonstrating consistent maintenance of safe distances even with moving pedestrians and challenging visibility. The work advances autonomous safety in human-robot cohabitation by delivering proactive, perception-driven avoidance capable of operating in complex, dynamic environments, while acknowledging limitations such as occlusions and high-velocity scenarios that motivate future multi-camera and multi-instance tracking enhancements.

Abstract

Proactive collision avoidance measures are imperative in environments where humans and robots coexist. Moreover, the introduction of high quality legged robots into workplaces highlighted the crucial role of a robust, fully autonomous safety solution for robots to be viable in shared spaces or in co-existence with humans. This article establishes for the first time ever an innovative Detect-Track-and-Avoid Architecture (DTAA) to enhance safety and overall mission performance. The proposed novel architectyre has the merit ot integrating object detection using YOLOv8, utilizing Ultralytics embedded object tracking, and state estimation of tracked objects through Kalman filters. Moreover, a novel heuristic clustering is employed to facilitate active avoidance of multiple closely positioned objects with similar velocities, creating sets of unsafe spaces for the Nonlinear Model Predictive Controller (NMPC) to navigate around. The NMPC identifies the most hazardous unsafe space, considering not only their current positions but also their predicted future locations. In the sequel, the NMPC calculates maneuvers to guide the robot along a path planned by D$^{*}_{+}$ towards its intended destination, while maintaining a safe distance to all identified obstacles. The efficacy of the novelly suggested DTAA framework is being validated by Real-life experiments featuring a Boston Dynamics Spot robot that demonstrates the robot's capability to consistently maintain a safe distance from humans in dynamic subterranean, urban indoor, and outdoor environments.

DTAA: A Detect, Track and Avoid Architecture for navigation in spaces with Multiple Velocity Objects

TL;DR

The paper tackles safe, autonomous navigation in spaces with multiple moving objects by proposing the Detect-Track-Avoid Architecture (DTAA). DTAA fuses real-time perception (YOLOv8), embedded tracking (Ultralytics), Kalman-based state estimation, and ellipse-based unsafe spaces with a nonlinear model predictive controller (NMPC) guided by a D path planner and an APF-adjusted reference trajectory. Its key contributions include multi-obstacle clustering into minimum spanning ellipses, prioritization for camera tracking, and real-time NMPC handling of multiple velocity obstacles under a horizon (e.g., ) with a safety margin . The system is validated on Boston Dynamics Spot across lab, corridor, outdoor, and subterranean settings, demonstrating consistent maintenance of safe distances even with moving pedestrians and challenging visibility. The work advances autonomous safety in human-robot cohabitation by delivering proactive, perception-driven avoidance capable of operating in complex, dynamic environments, while acknowledging limitations such as occlusions and high-velocity scenarios that motivate future multi-camera and multi-instance tracking enhancements.

Abstract

Proactive collision avoidance measures are imperative in environments where humans and robots coexist. Moreover, the introduction of high quality legged robots into workplaces highlighted the crucial role of a robust, fully autonomous safety solution for robots to be viable in shared spaces or in co-existence with humans. This article establishes for the first time ever an innovative Detect-Track-and-Avoid Architecture (DTAA) to enhance safety and overall mission performance. The proposed novel architectyre has the merit ot integrating object detection using YOLOv8, utilizing Ultralytics embedded object tracking, and state estimation of tracked objects through Kalman filters. Moreover, a novel heuristic clustering is employed to facilitate active avoidance of multiple closely positioned objects with similar velocities, creating sets of unsafe spaces for the Nonlinear Model Predictive Controller (NMPC) to navigate around. The NMPC identifies the most hazardous unsafe space, considering not only their current positions but also their predicted future locations. In the sequel, the NMPC calculates maneuvers to guide the robot along a path planned by D towards its intended destination, while maintaining a safe distance to all identified obstacles. The efficacy of the novelly suggested DTAA framework is being validated by Real-life experiments featuring a Boston Dynamics Spot robot that demonstrates the robot's capability to consistently maintain a safe distance from humans in dynamic subterranean, urban indoor, and outdoor environments.

Paper Structure

This paper contains 27 sections, 14 equations, 23 figures.

Figures (23)

  • Figure 1: Spot, the robot used for experimentation with the added sensors on top.
  • Figure 2: A block diagram of the system, DTAA is highlighted with a green box.
  • Figure 3: The coordinate frames used. $\mathcal{G}$ is the global frame, $\mathcal{B}$ is the body fixed frame for Spot, $\mathcal{I}$ is the camera image frame. $\mathcal{I}$ has a static relationship to $\mathcal{B}$. Whiles $\mathcal{B}$ moves relative to $\mathcal{G}$. The images $I_{rgb}$ and $I_{depth}$ are captured in $\mathcal{I}$, where YOLO detects objects and produces $BB$'s.
  • Figure 4: Total computation delay from the time instant that an image is initially received until the NMPC has calculated a control output versus different frame rates for the utilized camera.
  • Figure 5: In the corridor environment spot detecting two pedestrians moving forward (right cluster) $E_2$, while one pedestrian is standing and forming a separate cluster $E_1$.
  • ...and 18 more figures