Table of Contents
Fetching ...

Task-Guided Multi-Annotation Triplet Learning for Remote Sensing Representations

Meilun Zhou, Alina Zare

Abstract

Prior multi-task triplet loss methods relied on static weights to balance supervision between various types of annotation. However, static weighting requires tuning and does not account for how tasks interact when shaping a shared representation. To address this, the proposed task-guided multi-annotation triplet loss removes this dependency by selecting triplets through a mutual-information criteria that identifies triplets most informative across tasks. This strategy modifies which samples influence the representation rather than adjusting loss magnitudes. Experiments on an aerial wildlife dataset compare the proposed task-guided selection against several triplet loss setups for shaping a representation in an effective multi-task manner. The results show improved classification and regression performance and demonstrate that task-aware triplet selection produces a more effective shared representation for downstream tasks.

Task-Guided Multi-Annotation Triplet Learning for Remote Sensing Representations

Abstract

Prior multi-task triplet loss methods relied on static weights to balance supervision between various types of annotation. However, static weighting requires tuning and does not account for how tasks interact when shaping a shared representation. To address this, the proposed task-guided multi-annotation triplet loss removes this dependency by selecting triplets through a mutual-information criteria that identifies triplets most informative across tasks. This strategy modifies which samples influence the representation rather than adjusting loss magnitudes. Experiments on an aerial wildlife dataset compare the proposed task-guided selection against several triplet loss setups for shaping a representation in an effective multi-task manner. The results show improved classification and regression performance and demonstrate that task-aware triplet selection produces a more effective shared representation for downstream tasks.

Paper Structure

This paper contains 9 sections, 18 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Two-stage architecture. Stage one extracts frozen Vision Transformer–based model embeddings and applies different representation learning losses to shape the latent space. Stage two trains linear task heads on each latent space to measure the impact on downstream performance.
  • Figure 2: The figure presents two heatmaps that summarize how the percentage of top samples and the percentage of random samples impact downstream task performance for TG-MATL on CLIP embeddings. All reported values are the average over eight runs of each experiment.