Table of Contents
Fetching ...

Segmenting objects with Bayesian fusion of active contour models and convnet priors

Przemyslaw Polewski, Jacquelyn Shelton, Wei Yao, Marco Heurich

TL;DR

The paper tackles fine-grained instance segmentation in remote sensing by marrying a Bayesian maximum-a-posteriori framework with multi-contour active contours guided by CNN-derived priors for appearance and localization. It extends traditional eigenshape priors to non-linear shape modeling via a GAN-inspired decoder (Deep Shape Models) and represents shape evolution in a compact, differentiable coefficient space shaped by kernel PCA. The proposed loose-coupling allows drop-in CNN replacements and enables efficient GPU implementation, demonstrated by significant improvements in contour fidelity for delineating individual dead tree crowns over strong baselines like Mask R-CNN and K-net. This approach advances precise boundary delineation in NRM imagery, with practical implications for ecological monitoring and carbon/nutrient cycling analyses, while opening avenues for non-linear, task-specific shape modeling in instance segmentation.

Abstract

Instance segmentation is a core computer vision task with great practical significance. Recent advances, driven by large-scale benchmark datasets, have yielded good general-purpose Convolutional Neural Network (CNN)-based methods. Natural Resource Monitoring (NRM) utilizes remote sensing imagery with generally known scale and containing multiple overlapping instances of the same class, wherein the object contours are jagged and highly irregular. This is in stark contrast with the regular man-made objects found in classic benchmark datasets. We address this problem and propose a novel instance segmentation method geared towards NRM imagery. We formulate the problem as Bayesian maximum a posteriori inference which, in learning the individual object contours, incorporates shape, location, and position priors from state-of-the-art CNN architectures, driving a simultaneous level-set evolution of multiple object contours. We employ loose coupling between the CNNs that supply the priors and the active contour process, allowing a drop-in replacement of new network architectures. Moreover, we introduce a novel prior for contour shape, namely, a class of Deep Shape Models based on architectures from Generative Adversarial Networks (GANs). These Deep Shape Models are in essence a non-linear generalization of the classic Eigenshape formulation. In experiments, we tackle the challenging, real-world problem of segmenting individual dead tree crowns and delineating precise contours. We compare our method to two leading general-purpose instance segmentation methods - Mask R-CNN and K-net - on color infrared aerial imagery. Results show our approach to significantly outperform both methods in terms of reconstruction quality of tree crown contours. Furthermore, use of the GAN-based deep shape model prior yields significant improvement of all results over the vanilla Eigenshape prior.

Segmenting objects with Bayesian fusion of active contour models and convnet priors

TL;DR

The paper tackles fine-grained instance segmentation in remote sensing by marrying a Bayesian maximum-a-posteriori framework with multi-contour active contours guided by CNN-derived priors for appearance and localization. It extends traditional eigenshape priors to non-linear shape modeling via a GAN-inspired decoder (Deep Shape Models) and represents shape evolution in a compact, differentiable coefficient space shaped by kernel PCA. The proposed loose-coupling allows drop-in CNN replacements and enables efficient GPU implementation, demonstrated by significant improvements in contour fidelity for delineating individual dead tree crowns over strong baselines like Mask R-CNN and K-net. This approach advances precise boundary delineation in NRM imagery, with practical implications for ecological monitoring and carbon/nutrient cycling analyses, while opening avenues for non-linear, task-specific shape modeling in instance segmentation.

Abstract

Instance segmentation is a core computer vision task with great practical significance. Recent advances, driven by large-scale benchmark datasets, have yielded good general-purpose Convolutional Neural Network (CNN)-based methods. Natural Resource Monitoring (NRM) utilizes remote sensing imagery with generally known scale and containing multiple overlapping instances of the same class, wherein the object contours are jagged and highly irregular. This is in stark contrast with the regular man-made objects found in classic benchmark datasets. We address this problem and propose a novel instance segmentation method geared towards NRM imagery. We formulate the problem as Bayesian maximum a posteriori inference which, in learning the individual object contours, incorporates shape, location, and position priors from state-of-the-art CNN architectures, driving a simultaneous level-set evolution of multiple object contours. We employ loose coupling between the CNNs that supply the priors and the active contour process, allowing a drop-in replacement of new network architectures. Moreover, we introduce a novel prior for contour shape, namely, a class of Deep Shape Models based on architectures from Generative Adversarial Networks (GANs). These Deep Shape Models are in essence a non-linear generalization of the classic Eigenshape formulation. In experiments, we tackle the challenging, real-world problem of segmenting individual dead tree crowns and delineating precise contours. We compare our method to two leading general-purpose instance segmentation methods - Mask R-CNN and K-net - on color infrared aerial imagery. Results show our approach to significantly outperform both methods in terms of reconstruction quality of tree crown contours. Furthermore, use of the GAN-based deep shape model prior yields significant improvement of all results over the vanilla Eigenshape prior.

Paper Structure

This paper contains 29 sections, 25 equations, 17 figures.

Figures (17)

  • Figure 1: Illustration of the full pipeline of our proposed instance segmentation approach. Our method embeds information from two convolutional neural networks, namely semantic segmentation and object detection networks. This is performed in a multi-contour simultaneous optimization scheme over abstract, low-dimensional shape coefficients, to obtain high-quality object segmentations.
  • Figure 2: Left: an evolving contour (in blue) partitions the image plane into the foreground and background regions. Right: Corresponding level-set function representation of evolving shape. The contour corresponds to the function's zero level set, while positive and negative function values represent, respectively, image elements inside and outside the contour.
  • Figure 3: Visualization of the concept of a uniform vs. KDE probability model for shape coefficients. Left: uniform model assigns equal probability to all combinations of shape coefficients (blue points) in the hypercube spanned by the maximum extents of training examples. Right: KDE model assigns high probability only to regions around training samples (blue points), with arbitrary topology. The intensity of the red- colored transparent spheres around the blue points is proportional to the probability density of the shape coefficients in that region (dark red indicates highest probability.
  • Figure 4: Sample input for a 3-class segmentation problem: green vegetables (shown in green), potatoes (red), and fruit (blue). Shown: (a) original RGB image containing examples of all classes of objects, (b) 3-class (+background) semantic segmentation, (c) initial detected object bounding boxes. (Images taken from the COCO dataset.)
  • Figure 5: Visual comparison of shape/location parameter configurations which lead to low vs. high values of partial energy terms contributing to total energy from Eqs. \ref{['eq:multiContourEnergy']},\ref{['eq:E_prior_terms']}. Evolving contours shown as green, orange, blue polygons over semantic segmentation map. Consider the segmentation problem of delineating individual dead tree crowns (see Section \ref{['sec:problemSetting']} for details). A: input (original) RGB image transformed into per-pixel probability map of belonging to the dead tree class. right: image term is low when high-probability class pixels lie mostly within the contours and non-class pixels mostly outside the contours. B left: deriving a probabilistic shape model from training object masks. right: shape term is low when the model's probability of the evolving shape is high. C left: constructing a location prior around initial position provided by object detection network - probability of correct location shown by color scale from high (yellow) to low (blue). right: location term is low when the center of the evolving object (red point) is deemed probable by location model D Overlap: energy is low when evolving shapes do not overlap.
  • ...and 12 more figures