Relation Networks for Object Detection
Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei
TL;DR
This work introduces an object relation module that jointly reasons over a set of object proposals by combining appearance similarities with a translation-invariant geometric weight to model inter-object relations. Integrated into region-based detectors, it enhances instance recognition and replaces heuristic NMS with a learnable duplicate removal network, enabling end-to-end training. Extensive ablations on COCO demonstrate consistent gains across backbones and detectors, with the geometric weight and multiple relation components providing notable improvements. The approach offers a lightweight, plug-in building block that advances end-to-end object detection by exploiting object–object relations without requiring extra supervision.
Abstract
Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era. All state-of-the-art object detection systems still rely on recognizing object instances individually, without exploiting their relations during learning. This work proposes an object relation module. It processes a set of objects simultaneously through interaction between their appearance feature and geometry, thus allowing modeling of their relations. It is lightweight and in-place. It does not require additional supervision and is easy to embed in existing networks. It is shown effective on improving object recognition and duplicate removal steps in the modern object detection pipeline. It verifies the efficacy of modeling object relations in CNN based detection. It gives rise to the first fully end-to-end object detector.
