Table of Contents
Fetching ...

Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization

Maxime Fontana, Michael Spratling, Miaojing Shi

TL;DR

The paper tackles the challenge of coherent multi-task dense prediction by integrating cross-task coherence with a Trace-Back refinement mechanism and a dynamic loss prioritization scheme. MT-CP employs a vision-transformer backbone with task-specific decoders, a Coherence Fusion Module to align cross-task representations, and Spatial Refinement Modules to propagate cross-task information back through the decoders. The authors introduce a log-scale loss projection and a history-based, spread-controlled task prioritization to balance learning across tasks, and demonstrate state-of-the-art performance on NYUD-v2 and Pascal-Context. The approach yields improved geometric and predictive coherence across tasks and offers practical benefits for efficient, unified dense prediction models in indoor and scene understanding contexts.

Abstract

Multi-Task Learning (MTL) involves the concurrent training of multiple tasks, offering notable advantages for dense prediction tasks in computer vision. MTL not only reduces training and inference time as opposed to having multiple single-task models, but also enhances task accuracy through the interaction of multiple tasks. However, existing methods face limitations. They often rely on suboptimal cross-task interactions, resulting in task-specific predictions with poor geometric and predictive coherence. In addition, many approaches use inadequate loss weighting strategies, which do not address the inherent variability in task evolution during training. To overcome these challenges, we propose an advanced MTL model specifically designed for dense vision tasks. Our model leverages state-of-the-art vision transformers with task-specific decoders. To enhance cross-task coherence, we introduce a trace-back method that improves both cross-task geometric and predictive features. Furthermore, we present a novel dynamic task balancing approach that projects task losses onto a common scale and prioritizes more challenging tasks during training. Extensive experiments demonstrate the superiority of our method, establishing new state-of-the-art performance across two benchmark datasets. The code is available at:https://github.com/Klodivio355/MT-CP

Optimizing Dense Visual Predictions Through Multi-Task Coherence and Prioritization

TL;DR

The paper tackles the challenge of coherent multi-task dense prediction by integrating cross-task coherence with a Trace-Back refinement mechanism and a dynamic loss prioritization scheme. MT-CP employs a vision-transformer backbone with task-specific decoders, a Coherence Fusion Module to align cross-task representations, and Spatial Refinement Modules to propagate cross-task information back through the decoders. The authors introduce a log-scale loss projection and a history-based, spread-controlled task prioritization to balance learning across tasks, and demonstrate state-of-the-art performance on NYUD-v2 and Pascal-Context. The approach yields improved geometric and predictive coherence across tasks and offers practical benefits for efficient, unified dense prediction models in indoor and scene understanding contexts.

Abstract

Multi-Task Learning (MTL) involves the concurrent training of multiple tasks, offering notable advantages for dense prediction tasks in computer vision. MTL not only reduces training and inference time as opposed to having multiple single-task models, but also enhances task accuracy through the interaction of multiple tasks. However, existing methods face limitations. They often rely on suboptimal cross-task interactions, resulting in task-specific predictions with poor geometric and predictive coherence. In addition, many approaches use inadequate loss weighting strategies, which do not address the inherent variability in task evolution during training. To overcome these challenges, we propose an advanced MTL model specifically designed for dense vision tasks. Our model leverages state-of-the-art vision transformers with task-specific decoders. To enhance cross-task coherence, we introduce a trace-back method that improves both cross-task geometric and predictive features. Furthermore, we present a novel dynamic task balancing approach that projects task losses onto a common scale and prioritizes more challenging tasks during training. Extensive experiments demonstrate the superiority of our method, establishing new state-of-the-art performance across two benchmark datasets. The code is available at:https://github.com/Klodivio355/MT-CP

Paper Structure

This paper contains 17 sections, 7 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Our MTL framework implements cross-task coherence by tracing cross-task representations back through task-specific decoders and using them to refine the initial task predictions. The framework is optimized via a dynamic loss prioritization scheme.
  • Figure 2: The proposed MT-CP model. Only two tasks are shown for clarity. The model consists of a shared set of features extracted by a common backbone network (on the left). The model first performs a forward pass through each task-specific decoder. Next, it imposes cross-task coherence through the Coherence Fusion Module (CFM). It then traces back this cross-task representation through the Spatial Refinement Modules (SRMs) to refine an initial prediction. We optimize this model through a dynamic Loss Prioritization Scheme (LPS) which prioritizes challenging tasks throughout training.
  • Figure 3: The coherence fusion module.
  • Figure 4: The spatial refinement module used to trace back cross-task embeddings.
  • Figure 5: Visualisations of predictions on NYUD-v2 NYUv2.
  • ...and 1 more figures