Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images

Ammar Qammaz; Nikolaos Vasilikopoulos; Iason Oikonomidis; Antonis A. Argyros

Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images

Ammar Qammaz, Nikolaos Vasilikopoulos, Iason Oikonomidis, Antonis A. Argyros

TL;DR

Y-MAP-Net, a Y-shaped neural network architecture designed for real-time multi-task learning on RGB images, adopt a multi-teacher, single-student training paradigm, enabling it to distill their capabilities into a lightweight architecture suitable for real-time applications.

Abstract

We present Y-MAP-Net, a Y-shaped neural network architecture designed for real-time multi-task learning on RGB images. Y-MAP-Net, simultaneously predicts depth, surface normals, human pose, semantic segmentation and generates multi-label captions, all from a single network evaluation. To achieve this, we adopt a multi-teacher, single-student training paradigm, where task-specific foundation models supervise the network's learning, enabling it to distill their capabilities into a lightweight architecture suitable for real-time applications. Y-MAP-Net, exhibits strong generalization, simplicity and computational efficiency, making it ideal for robotics and other practical scenarios. To support future research, we will release our code publicly.

Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images

TL;DR

Abstract

Y-MAP-Net: Real-time depth, normals, segmentation, multi-label captioning and 2D human pose in RGB images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)