Table of Contents
Fetching ...

ComTraQ-MPC: Meta-Trained DQN-MPC Integration for Trajectory Tracking with Limited Active Localization Updates

Gokul Puthumanaillam, Manav Vora, Melkior Ornik

TL;DR

This work tackles trajectory tracking under partial observability with limited active localization updates by formulating a Budgeted POMDP and proposing ComTraQ-MPC, a hybrid system that couples a meta-trained DQN for adaptive localization scheduling with Model Predictive Control for precise tracking. The two modules interact bidirectionally: DQN decisions determine when to obtain true state information, influencing MPC’s belief and control, while MPC performance provides learning signals to refine the DQN policy. The approach is validated in both simulation and real-world robotic experiments, showing improved tracking accuracy and operational efficiency over baselines such as MPC with passive/naive localization and vanilla DQN. The meta-training across diverse trajectories and budgets enables generalization to unseen tasks, making ComTraQ-MPC a practical solution for resource-constrained autonomous navigation in complex environments. Overall, the framework delivers a generalizable and approximately optimal strategy for balancing localization budget and trajectory fidelity in partially observable domains, with potential extensions to multi-agent scenarios.

Abstract

Optimal decision-making for trajectory tracking in partially observable, stochastic environments where the number of active localization updates -- the process by which the agent obtains its true state information from the sensors -- are limited, presents a significant challenge. Traditional methods often struggle to balance resource conservation, accurate state estimation and precise tracking, resulting in suboptimal performance. This problem is particularly pronounced in environments with large action spaces, where the need for frequent, accurate state data is paramount, yet the capacity for active localization updates is restricted by external limitations. This paper introduces ComTraQ-MPC, a novel framework that combines Deep Q-Networks (DQN) and Model Predictive Control (MPC) to optimize trajectory tracking with constrained active localization updates. The meta-trained DQN ensures adaptive active localization scheduling, while the MPC leverages available state information to improve tracking. The central contribution of this work is their reciprocal interaction: DQN's update decisions inform MPC's control strategy, and MPC's outcomes refine DQN's learning, creating a cohesive, adaptive system. Empirical evaluations in simulated and real-world settings demonstrate that ComTraQ-MPC significantly enhances operational efficiency and accuracy, providing a generalizable and approximately optimal solution for trajectory tracking in complex partially observable environments.

ComTraQ-MPC: Meta-Trained DQN-MPC Integration for Trajectory Tracking with Limited Active Localization Updates

TL;DR

This work tackles trajectory tracking under partial observability with limited active localization updates by formulating a Budgeted POMDP and proposing ComTraQ-MPC, a hybrid system that couples a meta-trained DQN for adaptive localization scheduling with Model Predictive Control for precise tracking. The two modules interact bidirectionally: DQN decisions determine when to obtain true state information, influencing MPC’s belief and control, while MPC performance provides learning signals to refine the DQN policy. The approach is validated in both simulation and real-world robotic experiments, showing improved tracking accuracy and operational efficiency over baselines such as MPC with passive/naive localization and vanilla DQN. The meta-training across diverse trajectories and budgets enables generalization to unseen tasks, making ComTraQ-MPC a practical solution for resource-constrained autonomous navigation in complex environments. Overall, the framework delivers a generalizable and approximately optimal strategy for balancing localization budget and trajectory fidelity in partially observable domains, with potential extensions to multi-agent scenarios.

Abstract

Optimal decision-making for trajectory tracking in partially observable, stochastic environments where the number of active localization updates -- the process by which the agent obtains its true state information from the sensors -- are limited, presents a significant challenge. Traditional methods often struggle to balance resource conservation, accurate state estimation and precise tracking, resulting in suboptimal performance. This problem is particularly pronounced in environments with large action spaces, where the need for frequent, accurate state data is paramount, yet the capacity for active localization updates is restricted by external limitations. This paper introduces ComTraQ-MPC, a novel framework that combines Deep Q-Networks (DQN) and Model Predictive Control (MPC) to optimize trajectory tracking with constrained active localization updates. The meta-trained DQN ensures adaptive active localization scheduling, while the MPC leverages available state information to improve tracking. The central contribution of this work is their reciprocal interaction: DQN's update decisions inform MPC's control strategy, and MPC's outcomes refine DQN's learning, creating a cohesive, adaptive system. Empirical evaluations in simulated and real-world settings demonstrate that ComTraQ-MPC significantly enhances operational efficiency and accuracy, providing a generalizable and approximately optimal solution for trajectory tracking in complex partially observable environments.
Paper Structure (18 sections, 9 equations, 4 figures, 1 table)

This paper contains 18 sections, 9 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Performance of ComTraQ-MPC: Fig. (a) depicts ComTraQ-MPC in a known (previously seen during meta-training), trajectory (134 waypoints, active localization budget: 10) and (b) in a previously unseen, longer trajectory (245 waypoints, active localization budget: 20). Start and goal points are marked by blue and orange dots, respectively, with the red dotted line showing the reference trajectory and the green line the ComTraQ-MPC path.
  • Figure 2: Architectural Overview of ComTraQ-MPC Framework. This diagram illustrates the integration of DQN for dynamic localization decision-making with MPC for precise trajectory tracking. The architecture encapsulates the synergy between adaptive active localization scheduling and robust trajectory tracking, highlighting the flow of information and decision processes that enable effective decision-making in environments with active localization update constraints.
  • Figure 3: Comparison of trajectory tracking across scenarios. In each subfigure, the left image illustrates the trajectory in Scenario 1, while the right image depicts Scenario 2. Key points are color-coded: green for the start, purple for the goal, yellow for active localization updates. The reference trajectory is shown in red, and the trajectory produced by the evaluated approach is in blue.
  • Figure 4: Integrated Analysis of ComTraQ-MPC in Scenario 2. (a) t-SNE visualization of Q-values, highlighting the distinct clusters corresponding to Phase 1 (blue), Phase 2 (green), and Phase 3 (yellow) of the mission, with active localization updates denoted by red circles. (b) Variation of error between the average belief state and the true state, with active localization updates superimposed, corresponding to the color-coded mission phases in (a). (c) Path comparison illustrating the reference path and the ComTraQ-MPC path with active localization updates marked, mirroring the sequence of the mission phases as color-coded in (a) and (b).