Feedforward semantic segmentation with zoom-out features
Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich
TL;DR
Addresses semantic segmentation by reframing it as a per-superpixel classification problem using multi-level zoom-out features.Combines local, proximal, distant, and global context extracted via handcrafted features and pre-trained CNNs, fed to a feed-forward classifier with asymmetric loss to handle class imbalance.Achieves state-of-the-art performance on VOC 2012 test with mean IoU of 64.4%, demonstrating effective context modeling across multiple spatial scales without explicit structured prediction.Suggests that deep representations can be leveraged in a purely feed-forward framework, while leaving room for end-to-end training and selective integration with inference-based approaches.
Abstract
We introduce a purely feed-forward architecture for semantic segmentation. We map small image elements (superpixels) to rich feature representations extracted from a sequence of nested regions of increasing extent. These regions are obtained by "zooming out" from the superpixel all the way to scene-level resolution. This approach exploits statistical structure in the image and in the label space without setting up explicit structured prediction mechanisms, and thus avoids complex and expensive inference. Instead superpixels are classified by a feedforward multilayer network. Our architecture achieves new state of the art performance in semantic segmentation, obtaining 64.4% average accuracy on the PASCAL VOC 2012 test set.
