Table of Contents
Fetching ...

MANTA: Physics-Informed Generalized Underwater Object Tracking

Suhas Srinath, Hemang Jamadagni, Aditya Chadrasekar, Prathosh AP

TL;DR

Underwater tracking suffers from depth- and water-condition distortions that break terrestrial trackers; MANTA addresses this by combining physics-informed self-supervised learning with a three-stage tracking pipeline. A dual-positive contrastive framework learns domain-invariant features via temporal consistency and Beer–Lambert augmentations, while a vision-guided secondary association using geometric-appearance cues maintains long-term identity. The paper also introduces Center-Scale Consistency (CSC) and Geometric Alignment Score (GAS) to quantify geometric fidelity beyond IoU. Experiments on four underwater benchmarks demonstrate state-of-the-art accuracy and stable long-term tracking with competitive runtimes, showcasing the value of embedding physical priors into learning for challenging real-world domains.

Abstract

Underwater object tracking is challenging due to wavelength dependent attenuation and scattering, which severely distort appearance across depths and water conditions. Existing trackers trained on terrestrial data fail to generalize to these physics-driven degradations. We present MANTA, a physics-informed framework integrating representation learning with tracking design for underwater scenarios. We propose a dual-positive contrastive learning strategy coupling temporal consistency with Beer-Lambert augmentations to yield features robust to both temporal and underwater distortions. We further introduce a multi-stage pipeline augmenting motion-based tracking with a physics-informed secondary association algorithm that integrates geometric consistency and appearance similarity for re-identification under occlusion and drift. To complement standard IoU metrics, we propose Center-Scale Consistency (CSC) and Geometric Alignment Score (GAS) to assess geometric fidelity. Experiments on four underwater benchmarks (WebUOT-1M, UOT32, UTB180, UWCOT220) show that MANTA achieves state-of-the-art performance, improving Success AUC by up to 6 percent, while ensuring stable long-term generalized underwater tracking and efficient runtime.

MANTA: Physics-Informed Generalized Underwater Object Tracking

TL;DR

Underwater tracking suffers from depth- and water-condition distortions that break terrestrial trackers; MANTA addresses this by combining physics-informed self-supervised learning with a three-stage tracking pipeline. A dual-positive contrastive framework learns domain-invariant features via temporal consistency and Beer–Lambert augmentations, while a vision-guided secondary association using geometric-appearance cues maintains long-term identity. The paper also introduces Center-Scale Consistency (CSC) and Geometric Alignment Score (GAS) to quantify geometric fidelity beyond IoU. Experiments on four underwater benchmarks demonstrate state-of-the-art accuracy and stable long-term tracking with competitive runtimes, showcasing the value of embedding physical priors into learning for challenging real-world domains.

Abstract

Underwater object tracking is challenging due to wavelength dependent attenuation and scattering, which severely distort appearance across depths and water conditions. Existing trackers trained on terrestrial data fail to generalize to these physics-driven degradations. We present MANTA, a physics-informed framework integrating representation learning with tracking design for underwater scenarios. We propose a dual-positive contrastive learning strategy coupling temporal consistency with Beer-Lambert augmentations to yield features robust to both temporal and underwater distortions. We further introduce a multi-stage pipeline augmenting motion-based tracking with a physics-informed secondary association algorithm that integrates geometric consistency and appearance similarity for re-identification under occlusion and drift. To complement standard IoU metrics, we propose Center-Scale Consistency (CSC) and Geometric Alignment Score (GAS) to assess geometric fidelity. Experiments on four underwater benchmarks (WebUOT-1M, UOT32, UTB180, UWCOT220) show that MANTA achieves state-of-the-art performance, improving Success AUC by up to 6 percent, while ensuring stable long-term generalized underwater tracking and efficient runtime.

Paper Structure

This paper contains 22 sections, 10 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Tracking outputs from different methods on Video_0002 from the UTB180 utb180 dataset. MANTA (dashed red) reliably tracks the target correct object (solid green) across frames, maintaining identity even under occlusion or temporary disappearance.
  • Figure 2: Overview of MANTA. The self-supervised encoder $\mathcal{E}$ is trained via contrastive learning with Beer–Lambert augmentations $x_i^b$ and temporal augmentations $x_i^t$. Following detections from $D$ and primary tracking with $\mathcal{T}_P$, embeddings produced by $\mathcal{E}$ are used for vision-guided secondary association $\mathcal{T}_S$, which refines trajectories by matching IoU, scales, and centers, yielding accurate predictions for an input video sequence.
  • Figure 3: Comparison of tracking methods on frame 261 of the WebUOT-1M_Test_000388 sequence from the WebUOT-1M webuot dataset. Tracker outputs are shown in red, and ground-truth boxes in green. Unlike other methods that either drift, miss the target, or track incorrect objects, MANTA consistently identifies and follows the correct object.
  • Figure 4: Runtime breakdown percentages of total runtime.