Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation

Hoseong Jung; Sungil Son; Daesol Cho; Jonghae Park; Changhyun Choi; H. Jin Kim

Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation

Hoseong Jung, Sungil Son, Daesol Cho, Jonghae Park, Changhyun Choi, H. Jin Kim

TL;DR

TART leverages contrastive learning based on a mutual information objective, designed to capture inherent temporal dependencies in resource-maneuver interactions, which demonstrates its effectiveness in leveraging limited resources and producing context-aware subsequent maneuvers.

Abstract

Autonomous robotic systems should reason about resource control and its impact on subsequent maneuvers, especially when operating with limited energy budgets or restricted sensing. Learning-based control is effective in handling complex dynamics and represents the problem as a hybrid action space unifying discrete resource usage and continuous maneuvers. However, prior works on hybrid action space have not sufficiently captured the causal dependencies between resource usage and maneuvers. They have also overlooked the multi-modal nature of tactical decisions, both of which are critical in fast-evolving scenarios. In this paper, we propose TART, a Temporal Action Representation learning framework for Tactical resource control and subsequent maneuver generation. TART leverages contrastive learning based on a mutual information objective, designed to capture inherent temporal dependencies in resource-maneuver interactions. These learned representations are quantized into discrete codebook entries that condition the policy, capturing recurring tactical patterns and enabling multi-modal and temporally coherent behaviors. We evaluate TART in two domains where resource deployment is critical: (i) a maze navigation task where a limited budget of discrete actions provides enhanced mobility, and (ii) a high-fidelity air combat simulator in which an F-16 agent operates weapons and defensive systems in coordination with flight maneuvers. Across both domains, TART consistently outperforms hybrid-action baselines, demonstrating its effectiveness in leveraging limited resources and producing context-aware subsequent maneuvers.

Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation

TL;DR

Abstract

Paper Structure (30 sections, 11 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 11 equations, 5 figures, 1 table, 1 algorithm.

INTRODUCTION
Background and Related Work
Parameterized Action Markov Decision Process
Reinforcement Learning for Hybrid Action Spaces
Representation Learning for Reinforcement Learning
METHODS
Objective for Temporal Action Representation Learning
Learning a Latent Action Representation
Tactic-Guided Maneuver Generation
Training Protocol and Overall Objective
Evaluation Environments
Maze Navigation
Task & Actions
States & Rewards
Scenarios & Metrics
...and 15 more sections

Figures (5)

Figure 1: Snapshots of tactical decision-making in an air combat scenario. (a) A discrete action (e.g., weapon release) both constrains the set of feasible follow-up maneuvers (causal dependency) and (b) gives rise to multiple valid maneuver modes (multi-modality). TART is designed to capture these temporal dependencies and multi-modal outcomes, conditioning the policy to select context-appropriate maneuvers.
Figure 2: Overview of TART: (1) The agent interacts with the environment and collects a set of trajectories. (2) A mutual information objective guides the clustering of given trajectories into multiple tactical modes through contrastive learning (Sec. III-B). (3) The resulting distinct modes are then mapped to discrete vectors via vector quantization (VQ). The continuous actor distinguishes between the modes and generates multi-modal maneuver distributions accordingly (Sec. III-C).
Figure 3: Overviews and difficulty settings of the evaluation environments. (a)–(c) Maze Navigation: Easy (trivially solvable), Medium (complex), Hard (dynamic obstacles). (d)–(f) Air-to-Air Combat: Easy (fixed-maneuver opponent), Medium (unarmed evasive opponent), Hard (armed pursuing opponent).
Figure 4: Experimental results across the designed environments and metrics. Values are averaged over five seeds and black bars indicate standard deviation. For failed episodes, TTG and TTE are set to their maximum values (100 and 1800, respectively), while SPE is reported only for successful episodes. Hatched bars indicate the results discussed in the text and exhibiting superior performance.
Figure 5: Representative qualitative results in (a)-(b) Maze Navigation and (c)-(d) Air-to-Air Combat. (a) Heatmaps in Easy, Medium, and Hard scenarios, where the triangle marks the optimal path. Red indicates higher, while white indicates lower visitation frequency. (b) Deadlock examples, where the agent employs the Penetration action to navigate through cluttered corridors. (c) An offensive maneuver: the agent consecutively launches shots using the Missile action. (d) A defensive maneuver: the agent neutralizes the opponent's missile with a Defense action and responds with a counter shot.

Theorems & Definitions (1)

Definition 1

Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation

TL;DR

Abstract

Temporal Action Representation Learning for Tactical Resource Control and Subsequent Maneuver Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)