Table of Contents
Fetching ...

Multidimensional Task Learning: A Unified Tensor Framework for Computer Vision Tasks

Alaa El Ichi, Khalide Jbilou

TL;DR

It is shown that classification, segmentation, and detection are special cases of MTL, differing only in their dimensional configuration within a formally defined task space, and it is proved that this task space is strictly larger than what matrix-based formulations can natively express.

Abstract

This paper introduces Multidimensional Task Learning (MTL), a unified mathematical framework based on Generalized Einstein MLPs (GE-MLPs) that operate directly on tensors via the Einstein product. We argue that current computer vision task formulations are inherently constrained by matrix-based thinking: standard architectures rely on matrix-valued weights and vectorvalued biases, requiring structural flattening that restricts the space of naturally expressible tasks. GE-MLPs lift this constraint by operating with tensor-valued parameters, enabling explicit control over which dimensions are preserved or contracted without information loss. Through rigorous mathematical derivations, we demonstrate that classification, segmentation, and detection are special cases of MTL, differing only in their dimensional configuration within a formally defined task space. We further prove that this task space is strictly larger than what matrix-based formulations can natively express, enabling principled task configurations such as spatiotemporal or cross modal predictions that require destructive flattening under conventional approaches. This work provides a mathematical foundation for understanding, comparing, and designing computer vision tasks through the lens of tensor algebra.

Multidimensional Task Learning: A Unified Tensor Framework for Computer Vision Tasks

TL;DR

It is shown that classification, segmentation, and detection are special cases of MTL, differing only in their dimensional configuration within a formally defined task space, and it is proved that this task space is strictly larger than what matrix-based formulations can natively express.

Abstract

This paper introduces Multidimensional Task Learning (MTL), a unified mathematical framework based on Generalized Einstein MLPs (GE-MLPs) that operate directly on tensors via the Einstein product. We argue that current computer vision task formulations are inherently constrained by matrix-based thinking: standard architectures rely on matrix-valued weights and vectorvalued biases, requiring structural flattening that restricts the space of naturally expressible tasks. GE-MLPs lift this constraint by operating with tensor-valued parameters, enabling explicit control over which dimensions are preserved or contracted without information loss. Through rigorous mathematical derivations, we demonstrate that classification, segmentation, and detection are special cases of MTL, differing only in their dimensional configuration within a formally defined task space. We further prove that this task space is strictly larger than what matrix-based formulations can natively express, enabling principled task configurations such as spatiotemporal or cross modal predictions that require destructive flattening under conventional approaches. This work provides a mathematical foundation for understanding, comparing, and designing computer vision tasks through the lens of tensor algebra.
Paper Structure (13 sections, 5 theorems, 16 equations, 1 table)

This paper contains 13 sections, 5 theorems, 16 equations, 1 table.

Key Result

Theorem 3.1

\newlabelthm:classification_recovery Traditional image classification with $C$ classes is recovered by MTL with configuration: where $K_1 = C$, $J_1 = B$ (batch only), and $\mathcal{L}_{\text{CE}}$ is categorical cross-entropy.

Theorems & Definitions (13)

  • Definition 2.1: Task tuple configuration
  • Definition 2.2: Multidimensional Task
  • Definition 2.3: Structure Preservation Index
  • Theorem 3.1: Classification as Special Case
  • proof
  • Theorem 3.2: Dense Classification Extension
  • proof
  • Theorem 3.3: Segmentation Recovery
  • proof
  • Theorem 3.4: Detection Recovery
  • ...and 3 more