Boundary Attention: Learning curves, corners, junctions and grouping
Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler
TL;DR
The paper addresses the challenge of recovering precise, unrasterized image boundaries under noise by introducing Boundary Attention, a bottom-up, geometry-aware local attention mechanism that encodes boundaries as a dense field of unrasterized primitives. It represents local structures via a parameterization in junction space $\mathbf{g}= (\boldsymbol{u},\theta,\boldsymbol{\omega})$ with a learned patch window $w_k(x;\mathbf{p})$, and propagates information through patch-wise gather and pixel-wise slice to yield global maps such as an unsigned distance map and boundary map. The model is compact ($D_\gamma=64$, $D_\pi=8$, $\approx207k$ parameters) and trained in three synthetic stages on Kaleidoshapes data, yet it generalizes to real, noisy low-light photographs with strong boundary localization and robustness to noise. Empirical results show competitive or superior F-scores across noise levels, faster inference than optimization-based baselines, and good repeatability, highlighting the practical potential of a differentiable, bottom-up, geometry-based boundary representation.
Abstract
We present a lightweight network that infers grouping and boundaries, including curves, corners and junctions. It operates in a bottom-up fashion, analogous to classical methods for sub-pixel edge localization and edge-linking, but with a higher-dimensional representation of local boundary structure, and notions of local scale and spatial consistency that are learned instead of designed. Our network uses a mechanism that we call boundary attention: a geometry-aware local attention operation that, when applied densely and repeatedly, progressively refines a pixel-resolution field of variables that specify the boundary structure in every overlapping patch within an image. Unlike many edge detectors that produce rasterized binary edge maps, our model provides a rich, unrasterized representation of the geometric structure in every local region. We find that its intentional geometric bias allows it to be trained on simple synthetic shapes and then generalize to extracting boundaries from noisy low-light photographs.
