Table of Contents
Fetching ...

Joint Inverse Learning of Cognitive Radar Perception and Perception-Action Policy

Anoop C, Anup Aprem

TL;DR

An inverse learning Electronic Counter Measure (ECM) that infers both the perception and perception-driven action policy of the adversarial CR's from the actions of the CR, i.e. the sensing and transmission actions taken by the CR is considered.

Abstract

Cognitive Radars (CRs) employ perception-action cycle to adapt their sensing and transmission strategies based on its' perception of the target kinematic states and mission objectives. This paper considers an inverse learning Electronic Counter Measure (ECM) that infers both the perception and perception-driven action policy of the adversarial CR's from the actions of the CR, i.e. the sensing and transmission actions taken by the CR. Existing frameworks, in the literature, assume the knowledge of either the perception or the perception-action policy and infer the other. However, this assumption is unrealistic in an adversarial setting. We address this gap by proposing an online, nonparametric Bayesian machine learning framework and developing the Inverse Particle Filter with Dependent Dirichlet Process (IPFDDP) algorithm, which characterizes the perception-dependent action policy using a Dependent Dirichlet Process (DDP) and embeds kernel-based DDP inference within a Bayesian inverse particle filtering framework to jointly estimate the CR's perception and perception-action policy. Extensive numerical simulations demonstrate that IPFDDP outperforms existing inverse learning methods in terms of mean squared error, Kullback-Leibler divergence between the estimated and true policy, and accuracy in identifying relative action preferences. Unlike the existing techniques, the proposed Bayesian formulation naturally quantifies uncertainty in inferred perception and perception-action policy, enabling active probing strategies for sample efficient inverse learning. Simulation results show that active probing integrated with IPFDDP achieves, on average, a 40% faster reduction in KL divergence compared to randomized probing.

Joint Inverse Learning of Cognitive Radar Perception and Perception-Action Policy

TL;DR

An inverse learning Electronic Counter Measure (ECM) that infers both the perception and perception-driven action policy of the adversarial CR's from the actions of the CR, i.e. the sensing and transmission actions taken by the CR is considered.

Abstract

Cognitive Radars (CRs) employ perception-action cycle to adapt their sensing and transmission strategies based on its' perception of the target kinematic states and mission objectives. This paper considers an inverse learning Electronic Counter Measure (ECM) that infers both the perception and perception-driven action policy of the adversarial CR's from the actions of the CR, i.e. the sensing and transmission actions taken by the CR. Existing frameworks, in the literature, assume the knowledge of either the perception or the perception-action policy and infer the other. However, this assumption is unrealistic in an adversarial setting. We address this gap by proposing an online, nonparametric Bayesian machine learning framework and developing the Inverse Particle Filter with Dependent Dirichlet Process (IPFDDP) algorithm, which characterizes the perception-dependent action policy using a Dependent Dirichlet Process (DDP) and embeds kernel-based DDP inference within a Bayesian inverse particle filtering framework to jointly estimate the CR's perception and perception-action policy. Extensive numerical simulations demonstrate that IPFDDP outperforms existing inverse learning methods in terms of mean squared error, Kullback-Leibler divergence between the estimated and true policy, and accuracy in identifying relative action preferences. Unlike the existing techniques, the proposed Bayesian formulation naturally quantifies uncertainty in inferred perception and perception-action policy, enabling active probing strategies for sample efficient inverse learning. Simulation results show that active probing integrated with IPFDDP achieves, on average, a 40% faster reduction in KL divergence compared to randomized probing.
Paper Structure (21 sections, 1 theorem, 59 equations, 7 figures, 3 tables, 6 algorithms)

This paper contains 21 sections, 1 theorem, 59 equations, 7 figures, 3 tables, 6 algorithms.

Key Result

Proposition 1

Let $G_{0:k}$ denote the global belief--dependent policy posterior defined in eq:GlobalDDPUpdate. For a given belief $\pi \in \Pi$, define the predictive mean action distribution Define the acquisition function where $\mathcal{H}(\cdot)$ denotes Shannon entropy: and $\alpha_{0:k}^{(i)}(\pi)$ is the particle-level concentration parameter in eq:alphaUpdate. Selecting the next belief probe as pri

Figures (7)

  • Figure 1: CR-target interaction and the proposed inverse learning approach. We consider the problem of jointly estimating the radar's belief state $\pi_k$ and its belief-dependent policy $G_{\pi,\mathbf{a}} := p(\mathbf{a}|\pi)$, solely from observed state trajectories $\mathbf{x}_{1:k}$ and corresponding radar actions $\mathbf{a}_{1:k}$. The main challenge arises from the fact that the radar's belief $\pi_k$ is itself unobservable, and the action policy $G_{\pi,\mathbf{a}}$ is also unknown and potentially nonparametric.
  • Figure 2: The range ($r_k$) estimate obtained using CR Kalman filter tracker and the IPFDDP estimate of CR's belief. Since the generative models used by the CR are assumed to be known to the inverse learner, the uncertainty of the learner regarding the CR's mean estimate of the target state is attributed to the IPFDDP particles $\mathbf{y}_k^{(i)}$ representing the latent CR observations.
  • Figure 3: Trace of predictive covariance of CR's Kalman filter tracker and its estimate obtained by the learner using IPFDDP. The figure also shows the disjoint partitions of the CR's belief space as in \ref{['eq:Num_BeliefParts']}. The radar takes action according to \ref{['eq:Num_PolicyMapping']}: chooses LFM, PFM and HFM respectively for low, medium and high uncertainties respectively. $\tau_1$ and $\tau_2$ in \ref{['eq:Num_BeliefParts']} are chosen in our context as $0.49$ and $0.56$ respectively.
  • Figure 4: Mean of the approximated global policy learned via IPFDDP as a function of radar belief---covariance trace, for different data lengths $T \in \{30, 500, 2000\}$. Solid lines denote the mean estimated action probabilities, while dashed lines represent the true policy. As $T$ increases, the mean estimate converges to the true policy, demonstrating consistency of the proposed approach.
  • Figure 5: Samples on the probability simplex obtained from the Dirichlet process in \ref{['eq:GlobalDDPUpdate']} for $\mathrm{trace}(\Sigma_{k\mid k-1})\in\{0.2,0.5,0.8\}$, corresponding to three belief-space partitions defined in \ref{['eq:Num_BeliefParts']}. Blue points show Dirichlet samples, the red marker denotes the IPFDDP mean estimate, and the black marker indicates the true policy. For $T=2000$, the mean estimate closely matches the true policy. The spread of the samples depends on the initial concentration parameter $\alpha_0$ and the number of observations: larger $\alpha_0$ and increased data lead to higher confidence, consistent with the variance in \ref{['eq:GlobalVariance']}.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Proposition 1: Entropy--Concentration Based Active Probing