Table of Contents
Fetching ...

Hierarchical Multi-Label Classification with Missing Information for Benthic Habitat Imagery

Isaac Xu, Benjamin Misiuk, Scott C. Lowe, Martin Gillis, Craig J. Brown, Thomas Trappenberg

TL;DR

The paper tackles hierarchical multi-label classification of benthic habitat imagery under real-world missing-information conditions. It proposes a two-stage approach: self-supervised pretraining on a large in-domain benthic image corpus (BenthicNet) followed by hierarchical heads evaluation and constrained outputs (via C-HMCNN). Key contributions include establishing methods to train HML models with incomplete annotations and benchmarking in-domain SSL pretraining against ImageNet baselines, particularly benefiting small regional datasets. The findings demonstrate that in-domain SSL pretraining can extend deeper into the annotation hierarchy and improve performance under missing information, with practical implications for automated underwater image annotation and other domains with hierarchical labels.

Abstract

In this work, we apply state-of-the-art self-supervised learning techniques on a large dataset of seafloor imagery, \textit{BenthicNet}, and study their performance for a complex hierarchical multi-label (HML) classification downstream task. In particular, we demonstrate the capacity to conduct HML training in scenarios where there exist multiple levels of missing annotation information, an important scenario for handling heterogeneous real-world data collected by multiple research groups with differing data collection protocols. We find that, when using smaller one-hot image label datasets typical of local or regional scale benthic science projects, models pre-trained with self-supervision on a larger collection of in-domain benthic data outperform models pre-trained on ImageNet. In the HML setting, we find the model can attain a deeper and more precise classification if it is pre-trained with self-supervision on in-domain data. We hope this work can establish a benchmark for future models in the field of automated underwater image annotation tasks and can guide work in other domains with hierarchical annotations of mixed resolution.

Hierarchical Multi-Label Classification with Missing Information for Benthic Habitat Imagery

TL;DR

The paper tackles hierarchical multi-label classification of benthic habitat imagery under real-world missing-information conditions. It proposes a two-stage approach: self-supervised pretraining on a large in-domain benthic image corpus (BenthicNet) followed by hierarchical heads evaluation and constrained outputs (via C-HMCNN). Key contributions include establishing methods to train HML models with incomplete annotations and benchmarking in-domain SSL pretraining against ImageNet baselines, particularly benefiting small regional datasets. The findings demonstrate that in-domain SSL pretraining can extend deeper into the annotation hierarchy and improve performance under missing information, with practical implications for automated underwater image annotation and other domains with hierarchical labels.

Abstract

In this work, we apply state-of-the-art self-supervised learning techniques on a large dataset of seafloor imagery, \textit{BenthicNet}, and study their performance for a complex hierarchical multi-label (HML) classification downstream task. In particular, we demonstrate the capacity to conduct HML training in scenarios where there exist multiple levels of missing annotation information, an important scenario for handling heterogeneous real-world data collected by multiple research groups with differing data collection protocols. We find that, when using smaller one-hot image label datasets typical of local or regional scale benthic science projects, models pre-trained with self-supervision on a larger collection of in-domain benthic data outperform models pre-trained on ImageNet. In the HML setting, we find the model can attain a deeper and more precise classification if it is pre-trained with self-supervision on in-domain data. We hope this work can establish a benchmark for future models in the field of automated underwater image annotation tasks and can guide work in other domains with hierarchical annotations of mixed resolution.
Paper Structure (21 sections, 3 equations, 5 figures, 2 tables)

This paper contains 21 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of CATAMI categories involved in hierarchical learning. A pre-trained vision encoder is used to extract features from benthic imagery, which are passed to separate heads (each itself hierarchical or multi-label). The output from each head corresponds to one of the annotation categories.
  • Figure 2: CATAMI substrate label hierarchy. The first red square path reaches the full depth of the hierarchy along a branch extending to depth two, The second blue circle path does not reach the leaf nodes but reaches depth three. A potential multi-label annotation is the union of these two paths, although these individual paths are also valid annotations in our scheme. The yellow triangles illustrate the masked nodes not considered for loss, arising from a lack of precision in our blue circle component path. Nodes with a green dashed outline are in the CATAMI-extended scheme, but not the original CATAMI scheme; a red outline denotes a node that is in the CATAMI scheme but excluded from our study due to a lack of training annotations.
  • Figure 3: Demonstration of C-HMCNN's hierarchically constrained model output coherent_hierarchical_multi-label. The example toy hierarchy for animal contains two child nodes: dog and cat. The green dotted path represents a label: Animal $>$ Dog. This label is also represented in its raw label form (indicating which bits are to be flipped on), and as a bit-string, where each consecutive bit represents the $n$-th node. The model output (blue) is the prediction when the image sample corresponding to the green annotation is supplied to a trained model. In this example, it violates hierarchy, since the dog logit is greater than the animal logit. This output is then expanded and filtered through the descendent (adjacency) matrix which represents our hierarchy, via the Hadamard product. Finally, the maximum along each row is taken from the filtered output to obtain a constrained model output, preserving hierarchy.
  • Figure 4: Demonstration of masked loss calculation. The example depicts how loss is calculated for two heads on a batch of size three. In this batch, the lower category is entirely missing annotations, whereas in the top category, only the last sample is missing an annotation. However, in the first two samples, for the top head, the annotations are lacking in precision. Subsequently, the loss of the entire batch only averages over the contributing green annotated bits. This philosophy is then extended to the head level.
  • Figure 5: Model performance per substrate node. Detailed below each node are the average $F_1$ scores in the singular node case, comparing a supervised ImageNet pre-trained encoder to a BT encoder pre-trained for 400 epochs on BenthicNet (BT-400ep). The third score, BT-400ep (Max), represents the maximum the BT-400ep was able to achieve across three trials. The light grey percentage in the top right corner shows the proportion of samples that contain a positive instance of the node over all hierarchical substrate test data. Lastly, while the training set contains samples of Anthropogenic $>$ Tile, the test set does not. Neither training nor test sets contain barnacle plate samples.