Table of Contents
Fetching ...

Improving Classification of Occluded Objects through Scene Context

Courtney M. King, Daniel D. Leeds, Damian Lyons, George Kalaitzis

TL;DR

Occlusions challenge robust object recognition; the authors propose two scene-context fusion strategies, MNF and SCU, to integrate scene information into RPN-DCNN pipelines. MNF selects a scene-specific detector per image while SCU adjusts predictions post hoc using object-scene co-occurrence priors, with scene labels predicted by a CNN. Experiments on Occluded Groceries and TACO show improvements in average recall and precision, and training on a mix of occluded and unoccluded data yields the strongest gains in many settings. The work offers interpretable, adaptable methods to enhance occlusion robustness in real-world scenarios.

Abstract

The presence of occlusions has provided substantial challenges to typically-powerful object recognition algorithms. Additional sources of information can be extremely valuable to reduce errors caused by occlusions. Scene context is known to aid in object recognition in biological vision. In this work, we attempt to add robustness into existing Region Proposal Network-Deep Convolutional Neural Network (RPN-DCNN) object detection networks through two distinct scene-based information fusion techniques. We present one algorithm under each methodology: the first operates prior to prediction, selecting a custom object network to use based on the identified background scene, and the second operates after detection, fusing scene knowledge into initial object scores output by the RPN. We demonstrate our algorithms on challenging datasets featuring partial occlusions, which show overall improvement in both recall and precision against baseline methods. In addition, our experiments contrast multiple training methodologies for occlusion handling, finding that training on a combination of both occluded and unoccluded images demonstrates an improvement over the others. Our method is interpretable and can easily be adapted to other datasets, offering many future directions for research and practical applications.

Improving Classification of Occluded Objects through Scene Context

TL;DR

Occlusions challenge robust object recognition; the authors propose two scene-context fusion strategies, MNF and SCU, to integrate scene information into RPN-DCNN pipelines. MNF selects a scene-specific detector per image while SCU adjusts predictions post hoc using object-scene co-occurrence priors, with scene labels predicted by a CNN. Experiments on Occluded Groceries and TACO show improvements in average recall and precision, and training on a mix of occluded and unoccluded data yields the strongest gains in many settings. The work offers interpretable, adaptable methods to enhance occlusion robustness in real-world scenarios.

Abstract

The presence of occlusions has provided substantial challenges to typically-powerful object recognition algorithms. Additional sources of information can be extremely valuable to reduce errors caused by occlusions. Scene context is known to aid in object recognition in biological vision. In this work, we attempt to add robustness into existing Region Proposal Network-Deep Convolutional Neural Network (RPN-DCNN) object detection networks through two distinct scene-based information fusion techniques. We present one algorithm under each methodology: the first operates prior to prediction, selecting a custom object network to use based on the identified background scene, and the second operates after detection, fusing scene knowledge into initial object scores output by the RPN. We demonstrate our algorithms on challenging datasets featuring partial occlusions, which show overall improvement in both recall and precision against baseline methods. In addition, our experiments contrast multiple training methodologies for occlusion handling, finding that training on a combination of both occluded and unoccluded images demonstrates an improvement over the others. Our method is interpretable and can easily be adapted to other datasets, offering many future directions for research and practical applications.

Paper Structure

This paper contains 17 sections, 5 equations, 13 figures, 8 tables, 2 algorithms.

Figures (13)

  • Figure 1: Misclassified objects in Occluded Groceries dataset that can be resolved using scene context.
  • Figure 2: A subset of the Orange Juice items in the occluded groceries dataset shown in various background scenes.
  • Figure 3: Our Multi-Network Fusion (MNF) Algorithm trains scene-centric object detectors, then chooses the network for each image based on its predicted scene label.
  • Figure 4: Our Scene-Context Update (SCU) Algorithm updates initial object detections with scene context by incorporating a scene labeler and the dataset statistics.
  • Figure 5: Confusion matrices produced from models trained on the Occluded Groceries dataset, corresponding to Table \ref{['table:occluded_weights']}.
  • ...and 8 more figures