Table of Contents
Fetching ...

Point Linking Network for Object Detection

Xinggang Wang, Kaibing Chen, Zilong Huang, Cong Yao, Wenyu Liu

TL;DR

<p>Point Linking Network (PLN) reframes object detection as a point-pair reasoning problem, predicting center and corner points plus their linkage across a grid and fusing multiple branch predictions to form bounding boxes. It integrates point existence, class, location, and link information into a single end-to-end training objective, leveraging an Inception-v2 backbone with a fully convolutional design. Across VOC 2007/2012 and COCO, PLN achieves state-of-the-art results among single-model, single-scale detectors, and exhibits strong occlusion robustness due to multi-branch fusion and point-based representations. The method’s flexibility in bounding-box representation and the empirical gains from ablations underscore its practical potential, with further room for speed optimizations and broader domain transfer.

Abstract

Object detection is a core problem in computer vision. With the development of deep ConvNets, the performance of object detectors has been dramatically improved. The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, e.g., Faster-R-CNN, YOLO and SSD. Different from these methods that considering bounding box as a whole, we propose a novel object bounding box representation using points and links and implemented using deep ConvNets, termed as Point Linking Network (PLN). Specifically, we regress the corner/center points of bounding-box and their links using a fully convolutional network; then we map the corner points and their links back to multiple bounding boxes; finally an object detection result is obtained by fusing the multiple bounding boxes. PLN is naturally robust to object occlusion and flexible to object scale variation and aspect ratio variation. In the experiments, PLN with the Inception-v2 model achieves state-of-the-art single-model and single-scale results on the PASCAL VOC 2007, the PASCAL VOC 2012 and the COCO detection benchmarks without bells and whistles. The source code will be released.

Point Linking Network for Object Detection

TL;DR

<p>Point Linking Network (PLN) reframes object detection as a point-pair reasoning problem, predicting center and corner points plus their linkage across a grid and fusing multiple branch predictions to form bounding boxes. It integrates point existence, class, location, and link information into a single end-to-end training objective, leveraging an Inception-v2 backbone with a fully convolutional design. Across VOC 2007/2012 and COCO, PLN achieves state-of-the-art results among single-model, single-scale detectors, and exhibits strong occlusion robustness due to multi-branch fusion and point-based representations. The method’s flexibility in bounding-box representation and the empirical gains from ablations underscore its practical potential, with further room for speed optimizations and broader domain transfer.

Abstract

Object detection is a core problem in computer vision. With the development of deep ConvNets, the performance of object detectors has been dramatically improved. The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, e.g., Faster-R-CNN, YOLO and SSD. Different from these methods that considering bounding box as a whole, we propose a novel object bounding box representation using points and links and implemented using deep ConvNets, termed as Point Linking Network (PLN). Specifically, we regress the corner/center points of bounding-box and their links using a fully convolutional network; then we map the corner points and their links back to multiple bounding boxes; finally an object detection result is obtained by fusing the multiple bounding boxes. PLN is naturally robust to object occlusion and flexible to object scale variation and aspect ratio variation. In the experiments, PLN with the Inception-v2 model achieves state-of-the-art single-model and single-scale results on the PASCAL VOC 2007, the PASCAL VOC 2012 and the COCO detection benchmarks without bells and whistles. The source code will be released.

Paper Structure

This paper contains 17 sections, 5 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Illustration of our object detection idea using point linking. Our algorithm detects object by predicting the center-corner point pair, such as $OC_1$, $OC_2$, $OC_3$, and $OC_4$. The positions of points will be predicted according to the grids in the image. Once we get a pair of center and corner points, the bounding box of object is easily obtained.
  • Figure 2: The network architecture of PLN. The detection network is based on Inception-v2. We use Inception-v2 and some additional convolutional layers to regress parameters of points, then parse the parameters to obtain the bounding box and category label of object. Finally, we combine boxes of four branch (left-top, right-top, left-bot and right-bot) and apply NMS to obtain final object detection result.
  • Figure 3: Visualization of the detection results on PASCAL VOC 2007. The ground-truth of objects are drawn in blue color, the true positives are drawn in yellow color, and the false positives are drawn in red color. Zoom in to find the "false" false positives.
  • Figure 4: Error analysis On PLN, Faster R-CNN Ref:FasterRCNN-Ren2015 and YOLO Ref:YOLO-Redmon2016.
  • Figure 5: Comparison the detection results using different corner points paired with the center point. From left to right, each column shows the detection results of using the left-top corner, the right-top corner, the left-bottom corner, the right-bottom corner and the results by fusing the four corners.