Table of Contents
Fetching ...

Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships

Rangel Daroya, Aaron Sun, Subhransu Maji

TL;DR

Task2Box introduces axis-aligned box embeddings to model asymmetric relationships between tasks and datasets, addressing limitations of symmetric Euclidean representations. By learning a mapping from base task representations (e.g., CLIP, Task2Vec, or attribute vectors) to low-dimensional boxes, the method uses volumetric overlaps to encode containment and transfer affinities. Across iNaturalist+CUB, ImageNet, and Taskonomy, Task2Box outperforms baselines and generalizes to novel tasks, while providing interpretable visualizations of task spaces and dataset relationships. The approach supports dataset discovery and transfer planning, with potential extensions to additional modalities and richer task descriptors via datasheets and model cards.

Abstract

Modeling and visualizing relationships between tasks or datasets is an important step towards solving various meta-tasks such as dataset discovery, multi-tasking, and transfer learning. However, many relationships, such as containment and transferability, are naturally asymmetric and current approaches for representation and visualization (e.g., t-SNE) do not readily support this. We propose Task2Box, an approach to represent tasks using box embeddings -- axis-aligned hyperrectangles in low dimensional spaces -- that can capture asymmetric relationships between them through volumetric overlaps. We show that Task2Box accurately predicts unseen hierarchical relationships between nodes in ImageNet and iNaturalist datasets, as well as transferability between tasks in the Taskonomy benchmark. We also show that box embeddings estimated from task representations (e.g., CLIP, Task2Vec, or attribute based) can be used to predict relationships between unseen tasks more accurately than classifiers trained on the same representations, as well as handcrafted asymmetric distances (e.g., KL divergence). This suggests that low-dimensional box embeddings can effectively capture these task relationships and have the added advantage of being interpretable. We use the approach to visualize relationships among publicly available image classification datasets on popular dataset hosting platform called Hugging Face.

Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships

TL;DR

Task2Box introduces axis-aligned box embeddings to model asymmetric relationships between tasks and datasets, addressing limitations of symmetric Euclidean representations. By learning a mapping from base task representations (e.g., CLIP, Task2Vec, or attribute vectors) to low-dimensional boxes, the method uses volumetric overlaps to encode containment and transfer affinities. Across iNaturalist+CUB, ImageNet, and Taskonomy, Task2Box outperforms baselines and generalizes to novel tasks, while providing interpretable visualizations of task spaces and dataset relationships. The approach supports dataset discovery and transfer planning, with potential extensions to additional modalities and richer task descriptors via datasheets and model cards.

Abstract

Modeling and visualizing relationships between tasks or datasets is an important step towards solving various meta-tasks such as dataset discovery, multi-tasking, and transfer learning. However, many relationships, such as containment and transferability, are naturally asymmetric and current approaches for representation and visualization (e.g., t-SNE) do not readily support this. We propose Task2Box, an approach to represent tasks using box embeddings -- axis-aligned hyperrectangles in low dimensional spaces -- that can capture asymmetric relationships between them through volumetric overlaps. We show that Task2Box accurately predicts unseen hierarchical relationships between nodes in ImageNet and iNaturalist datasets, as well as transferability between tasks in the Taskonomy benchmark. We also show that box embeddings estimated from task representations (e.g., CLIP, Task2Vec, or attribute based) can be used to predict relationships between unseen tasks more accurately than classifiers trained on the same representations, as well as handcrafted asymmetric distances (e.g., KL divergence). This suggests that low-dimensional box embeddings can effectively capture these task relationships and have the added advantage of being interpretable. We use the approach to visualize relationships among publicly available image classification datasets on popular dataset hosting platform called Hugging Face.
Paper Structure (27 sections, 13 equations, 11 figures, 7 tables)

This paper contains 27 sections, 13 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Box Embeddings of 150 Datasets of iNaturalist + CUB and Corresponding Learned Hierarchy for Class Arachnida. Each taxonomic category is treated as a separate dataset for which Task2Box embeddings are learned. (1) Shows the learned box embeddings where datasets from the same group (taxonomic class) have the same color. Datasets naturally cluster to their ground truth groups. (2) Shows the hierarchy learned through Task2Box for a specific class. The hierarchy matches the ground truth relationships based on biological classification. Orders that belong to class Arachnida are learned as boxes (, , ) contained by the larger box for Arachnida; families under each of the orders are learned as smaller boxes contained by the corresponding orders they belong to.
  • Figure 2: Task2Box Embeddings in 2D for Mammalia, Canidae, and Amphibia Datasets from iNaturalist. Each embedding represents the coordinates of the lower left and upper right corners of each box/rectangle. Since Canidae ($z_1$) is a proper subset of Mammalia ($z_2$): $d_{box}(z_1,z_2)=1$ and $d_{box}(z_2,z_1)=0.1$.
  • Figure 3: Visualization of Instrument-related Datasets in ImageNet. Datasets that belong to the same superset are shaded in the same color. Task2Box learns the hierarchy of various groups, and clusters similar datasets closer.
  • Figure 4: Visualization of Tasks in Taskonomy showing Source Tasks that Transfer Well to Target Tasks (shaded). (a) Jigsaw and Triplet Fixated Camera Pose estimation are source tasks that transfer well to Depth estimation. (b) and (c) show different source tasks (larger boxes) that transfer well to the shaded boxes of Denoising Autoencoder and Surface Normal, respectively.
  • Figure 5: Visualizing Image Classification Datasets in Hugging Face. The sample data points annotated on the highlighted datasets show that common tasks overlap with each other (e.g., sentiment classification and document classification datasets). Although labels could slightly differ between datasets, Task2Box can infer the level of similarity and represent it as the amount of overlap. The embedding size (box area) also shows the number of available data.
  • ...and 6 more figures