Table of Contents
Fetching ...

Towards Commonsense Knowledge based Fuzzy Systems for Supporting Size-Related Fine-Grained Object Detection

Pu Zhang, Tianhua Chen, Bin Liu

TL;DR

This work tackles size-related fine-grained object detection by augmenting a lightweight coarse-grained detector with a commonsense knowledge inference module (CKIM). It introduces two CKIM variants—crisp-rule and fuzzy-rule—grounded in two size-related knowledge rules and operationalized through BoxS and DtoC features, learning with minimal data. Empirical results on CLEVR-derived datasets demonstrate that CKIM-enhanced detectors achieve higher mAP@0.5 while reducing model size and latency, with fuzzy CKIM offering advantages for multi-class labeling. The approach promises practical gains for edge-facing detection tasks where fine-grained annotations are scarce, and suggests avenues for knowledge acquisition via human expertise or LLMs.

Abstract

Deep learning has become the dominating approach for object detection. To achieve accurate fine-grained detection, one needs to employ a large enough model and a vast amount of data annotations. In this paper, we propose a commonsense knowledge inference module (CKIM) which leverages commonsense knowledge to assist a lightweight deep neural network base coarse-grained object detector to achieve accurate fine-grained detection. Specifically, we focus on a scenario where a single image contains objects of similar categories but varying sizes, and we establish a size-related commonsense knowledge inference module (CKIM) that maps the coarse-grained labels produced by the DL detector to size-related fine-grained labels. Considering that rule-based systems are one of the popular methods of knowledge representation and reasoning, our experiments explored two types of rule-based CKIMs, implemented using crisp-rule and fuzzy-rule approaches, respectively. Experimental results demonstrate that compared with baseline methods, our approach achieves accurate fine-grained detection with a reduced amount of annotated data and smaller model size. Our code is available at: https://github.com/ZJLAB-AMMI/CKIM.

Towards Commonsense Knowledge based Fuzzy Systems for Supporting Size-Related Fine-Grained Object Detection

TL;DR

This work tackles size-related fine-grained object detection by augmenting a lightweight coarse-grained detector with a commonsense knowledge inference module (CKIM). It introduces two CKIM variants—crisp-rule and fuzzy-rule—grounded in two size-related knowledge rules and operationalized through BoxS and DtoC features, learning with minimal data. Empirical results on CLEVR-derived datasets demonstrate that CKIM-enhanced detectors achieve higher mAP@0.5 while reducing model size and latency, with fuzzy CKIM offering advantages for multi-class labeling. The approach promises practical gains for edge-facing detection tasks where fine-grained annotations are scarce, and suggests avenues for knowledge acquisition via human expertise or LLMs.

Abstract

Deep learning has become the dominating approach for object detection. To achieve accurate fine-grained detection, one needs to employ a large enough model and a vast amount of data annotations. In this paper, we propose a commonsense knowledge inference module (CKIM) which leverages commonsense knowledge to assist a lightweight deep neural network base coarse-grained object detector to achieve accurate fine-grained detection. Specifically, we focus on a scenario where a single image contains objects of similar categories but varying sizes, and we establish a size-related commonsense knowledge inference module (CKIM) that maps the coarse-grained labels produced by the DL detector to size-related fine-grained labels. Considering that rule-based systems are one of the popular methods of knowledge representation and reasoning, our experiments explored two types of rule-based CKIMs, implemented using crisp-rule and fuzzy-rule approaches, respectively. Experimental results demonstrate that compared with baseline methods, our approach achieves accurate fine-grained detection with a reduced amount of annotated data and smaller model size. Our code is available at: https://github.com/ZJLAB-AMMI/CKIM.
Paper Structure (17 sections, 6 equations, 6 figures, 5 tables)

This paper contains 17 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison of the working mechanisms between a typical fine-grained object detector (left) and our approach (right)
  • Figure 2: An example image that illustrates the connection between an object's distance to the camera (DtoC) and the distance from the center of the object to the bottom of the image (CtoB). As shown in the image, CtoB of object A, denoted by CtoB$_A$, is larger than CtoB$_B$. This information can be used to infer that DtoC of object A, denoted by DtoC$_A$, is larger than DtoC$_B$, which is clearly true as shown in the picture.
  • Figure 3: Examples images for which the camera position for photographing the objects is located directly above or directly below the objects. Left: image of flying birds; Middle: remote sensing images; Right: Bowl in the cupboard and cup on the counter.
  • Figure 4: Top: an example image in the CLEVR-96 dataset, where the object size attribute is specified as either 'large' or 'small'; Bottom: an example image in the CLEVR-144 dataset, where the object size attribute can be 'large', 'middle', or 'small'.
  • Figure 5: Training process of a pair of fine-grained and coarse-grained object detectors
  • ...and 1 more figures