Stone Soup Multi-Target Tracking Feature Extraction For Autonomous Search And Track In Deep Reinforcement Learning Environment
Jan-Hendrik Ewers, Joe Gibbs, David Anderson
TL;DR
The paper tackles autonomous sensor management for heterogeneous sensing in future aerial systems by leveraging Stone Soup as a multi-target tracking feature extractor within a Gymnasium-enabled RL loop. It introduces a framework to embed Stone Soup components into Stable Baselines3 workflows, enabling MT trackers and track lists to form informative observations for RL agents. Through a demonstrative AESA radar search-and-track problem, the study shows that RL agents using Stone Soup-based features can outperform basic sensor policies, with BiGRU+MHSA architectures delivering the strongest performance and GOSPA reduction. The work demonstrates the practicality of integrating track-based feature extraction into RL training and identifies avenues for future enhancements, such as multi-agent setups and trajectory-based tracking methods. Overall, this approach offers a scalable path to improving autonomy and efficiency in complex sensing environments.
Abstract
Management of sensing resources is a non-trivial problem for future military air assets with future systems deploying heterogeneous sensors to generate information of the battlespace. Machine learning techniques including deep reinforcement learning (DRL) have been identified as promising approaches, but require high-fidelity training environments and feature extractors to generate information for the agent. This paper presents a deep reinforcement learning training approach, utilising the Stone Soup tracking framework as a feature extractor to train an agent for a sensor management task. A general framework for embedding Stone Soup tracker components within a Gymnasium environment is presented, enabling fast and configurable tracker deployments for RL training using Stable Baselines3. The approach is demonstrated in a sensor management task where an agent is trained to search and track a region of airspace utilising track lists generated from Stone Soup trackers. A sample implementation using three neural network architectures in a search-and-track scenario demonstrates the approach and shows that RL agents can outperform simple sensor search and track policies when trained within the Gymnasium and Stone Soup environment.
