Table of Contents
Fetching ...

Boundary Attention: Learning curves, corners, junctions and grouping

Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler

TL;DR

The paper addresses the challenge of recovering precise, unrasterized image boundaries under noise by introducing Boundary Attention, a bottom-up, geometry-aware local attention mechanism that encodes boundaries as a dense field of unrasterized primitives. It represents local structures via a parameterization in junction space $\mathbf{g}= (\boldsymbol{u},\theta,\boldsymbol{\omega})$ with a learned patch window $w_k(x;\mathbf{p})$, and propagates information through patch-wise gather and pixel-wise slice to yield global maps such as an unsigned distance map and boundary map. The model is compact ($D_\gamma=64$, $D_\pi=8$, $\approx207k$ parameters) and trained in three synthetic stages on Kaleidoshapes data, yet it generalizes to real, noisy low-light photographs with strong boundary localization and robustness to noise. Empirical results show competitive or superior F-scores across noise levels, faster inference than optimization-based baselines, and good repeatability, highlighting the practical potential of a differentiable, bottom-up, geometry-based boundary representation.

Abstract

We present a lightweight network that infers grouping and boundaries, including curves, corners and junctions. It operates in a bottom-up fashion, analogous to classical methods for sub-pixel edge localization and edge-linking, but with a higher-dimensional representation of local boundary structure, and notions of local scale and spatial consistency that are learned instead of designed. Our network uses a mechanism that we call boundary attention: a geometry-aware local attention operation that, when applied densely and repeatedly, progressively refines a pixel-resolution field of variables that specify the boundary structure in every overlapping patch within an image. Unlike many edge detectors that produce rasterized binary edge maps, our model provides a rich, unrasterized representation of the geometric structure in every local region. We find that its intentional geometric bias allows it to be trained on simple synthetic shapes and then generalize to extracting boundaries from noisy low-light photographs.

Boundary Attention: Learning curves, corners, junctions and grouping

TL;DR

The paper addresses the challenge of recovering precise, unrasterized image boundaries under noise by introducing Boundary Attention, a bottom-up, geometry-aware local attention mechanism that encodes boundaries as a dense field of unrasterized primitives. It represents local structures via a parameterization in junction space with a learned patch window , and propagates information through patch-wise gather and pixel-wise slice to yield global maps such as an unsigned distance map and boundary map. The model is compact (, , parameters) and trained in three synthetic stages on Kaleidoshapes data, yet it generalizes to real, noisy low-light photographs with strong boundary localization and robustness to noise. Empirical results show competitive or superior F-scores across noise levels, faster inference than optimization-based baselines, and good repeatability, highlighting the practical potential of a differentiable, bottom-up, geometry-based boundary representation.

Abstract

We present a lightweight network that infers grouping and boundaries, including curves, corners and junctions. It operates in a bottom-up fashion, analogous to classical methods for sub-pixel edge localization and edge-linking, but with a higher-dimensional representation of local boundary structure, and notions of local scale and spatial consistency that are learned instead of designed. Our network uses a mechanism that we call boundary attention: a geometry-aware local attention operation that, when applied densely and repeatedly, progressively refines a pixel-resolution field of variables that specify the boundary structure in every overlapping patch within an image. Unlike many edge detectors that produce rasterized binary edge maps, our model provides a rich, unrasterized representation of the geometric structure in every local region. We find that its intentional geometric bias allows it to be trained on simple synthetic shapes and then generalize to extracting boundaries from noisy low-light photographs.
Paper Structure (18 sections, 22 equations, 20 figures, 2 tables)

This paper contains 18 sections, 22 equations, 20 figures, 2 tables.

Figures (20)

  • Figure 1: Pipeline overview. The image unfolds into stride-1 patches, and boundary attention operates iteratively on their embeddings to produce for each patch: (i) a parametric three-way partitioning, and (ii) a parametric windowing function that defines its effective patch size. (Figure \ref{['fig:junction-space']} shows parameterization details.) This output field implies a variety of global maps, shown in clockwise order: a boundary-aware smoothing of the input colors; an unsigned boundary-distance map; a boundary map; and a map of spatial affinities between any query point and its neighbors.
  • Figure 2: Parameterization details. Left: Each patch $k$ is associated with an unrasterized three-way partitioning of its area (colored blue, orange and purple here). The partitioning parameters comprise a vertex $(u,v)$, orientation $\theta$, and angles $(\omega_1,\omega_2,\omega_3)$, defined up to scale. A: A walk through junction space by linearly interpolating between junctions is spatially smooth, and can represent edges, bars, corners, Y-junctions, T-junctions and uniform regions. B: Each junction is modulated through a learned windowing function. The windowing parameters $\mathbf{p}=(p_1,p_2,p_3)$ are convex weights over a dictionary of binary pillboxes.
  • Figure 3: Example of our model's output, with examples from two different regions. Top row: Some of each region's overlapping input patches, their corresponding outputs (visualized in the style of Figure \ref{['fig:junction-space']}), and three types of per-patch attributes that the outputs imply: unsigned distance; boundaries; and gathered wedge colors. Bottom row: Four types of global maps that are implied by accumulating values from the output field and rendered patches.
  • Figure 4: Model Architecture. All blocks are invariant to discrete spatial shifts, and only colored blocks are learned. Orange blocks operate at individual locations $n$, while blue ones operate on small spatial neighborhoods. Symbol $\oplus$ is concatenation, and gather and slice operators (Eqs. \ref{['eq:gather']}--\ref{['eq:globalf']}) are depicted at right. The first iteration uses $\boldsymbol{\gamma}^0[n]=\boldsymbol{\gamma}_0[n]$, $\bar{\mathbf{f}}^0[n]=\mathbf{f}[n]$, and $\boldsymbol{\pi}^0[n]=\boldsymbol{\pi}_o$ with $\boldsymbol{\pi}_o$ learned across the training set. Boundary attention repeats $T=8$ times, with one set of weights for the first four iterations and a separate set of weights for the last four iterations, resulting in 207k trainable parameters total.
  • Figure 5: Left: ODS F-score for our method and multiple baselines at different noise levels computed on noisy synthetic data. The bottom inset show example patches at representative PSNR values. Our method outperforms all baselines at low noise and is better or competitive with other techniques at high noise. Right: Comparing the F-score for different techniques with their runtime. Our method has the best average F-score while also being much faster than the second best method Field of Junctions.
  • ...and 15 more figures