Table of Contents
Fetching ...

MOD-CL: Multi-label Object Detection with Constrained Loss

Sota Moriyama, Koji Watanabe, Katsumi Inoue, Akihiro Takemura

TL;DR

MOD-CL addresses multi-label object detection by enforcing requirement satisfaction through constrained losses. It builds MODYOLO on YOLOv8 and evaluates two tasks: Task 1 employs a semi-supervised two-stage training with Corrector and Blender to refine outputs under partial labeling, and Task 2 uses a constrained loss on selected anchors together with a MaxSAT solver to produce requirement-satisfying label sets. Results show modest gains in mAP@0.5 for Task 1 and substantial gains in precision, recall, and F1 for Task 2 when using constrained losses. The approach demonstrates substantial potential for producing reliable, requirement-compliant detections in scenarios with partial labeling or where guarantees are needed for downstream decision-making.

Abstract

We introduce MOD-CL, a multi-label object detection framework that utilizes constrained loss in the training process to produce outputs that better satisfy the given requirements. In this paper, we use $\mathrm{MOD_{YOLO}}$, a multi-label object detection model built upon the state-of-the-art object detection model YOLOv8, which has been published in recent years. In Task 1, we introduce the Corrector Model and Blender Model, two new models that follow after the object detection process, aiming to generate a more constrained output. For Task 2, constrained losses have been incorporated into the $\mathrm{MOD_{YOLO}}$ architecture using Product T-Norm. The results show that these implementations are instrumental to improving the scores for both Task 1 and Task 2.

MOD-CL: Multi-label Object Detection with Constrained Loss

TL;DR

MOD-CL addresses multi-label object detection by enforcing requirement satisfaction through constrained losses. It builds MODYOLO on YOLOv8 and evaluates two tasks: Task 1 employs a semi-supervised two-stage training with Corrector and Blender to refine outputs under partial labeling, and Task 2 uses a constrained loss on selected anchors together with a MaxSAT solver to produce requirement-satisfying label sets. Results show modest gains in mAP@0.5 for Task 1 and substantial gains in precision, recall, and F1 for Task 2 when using constrained losses. The approach demonstrates substantial potential for producing reliable, requirement-compliant detections in scenarios with partial labeling or where guarantees are needed for downstream decision-making.

Abstract

We introduce MOD-CL, a multi-label object detection framework that utilizes constrained loss in the training process to produce outputs that better satisfy the given requirements. In this paper, we use , a multi-label object detection model built upon the state-of-the-art object detection model YOLOv8, which has been published in recent years. In Task 1, we introduce the Corrector Model and Blender Model, two new models that follow after the object detection process, aiming to generate a more constrained output. For Task 2, constrained losses have been incorporated into the architecture using Product T-Norm. The results show that these implementations are instrumental to improving the scores for both Task 1 and Task 2.
Paper Structure (9 sections, 2 equations, 1 figure, 2 tables)

This paper contains 9 sections, 2 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: MODYOLO with Corrector-Blender Model. Green blocks represent the parts being trained.