Table of Contents
Fetching ...

EasyControlEdge: A Foundation-Model Fine-Tuning for Edge Detection

Hiroki Nakamura, Hiroto Iino, Masashi Okada, Tadahiro Taniguchi

TL;DR

Edge detection benefits from crispness and data efficiency, which are not fully served by prior methods. EasyControlEdge adapts a vision diffusion foundation model via lightweight Condition Injection LoRA, introduces a pixel-space loss for pixel-accurate localization, and enables inference-time edge-density control through classifier-free guidance. Across BSDS500, NYUDv2, BIPED, and CubiCasa, it delivers competitive or superior results, notably in raw-edge (CEval) performance and in low-data regimes, while offering adjustable edge density without retraining. This work demonstrates that combining foundation-model priors, targeted pixel supervision, and controllable inference yields practical, high-fidelity edge maps suitable for downstream tasks like floor-plan reconstruction and wall-boundary extraction.

Abstract

We propose EasyControlEdge, adapting an image-generation foundation model to edge detection. In real-world edge detection (e.g., floor-plan walls, satellite roads/buildings, and medical organ boundaries), crispness and data efficiency are crucial, yet producing crisp raw edge maps with limited training samples remains challenging. Although image-generation foundation models perform well on many downstream tasks, their pretrained priors for data-efficient transfer and iterative refinement for high-frequency detail preservation remain underexploited for edge detection. To enable crisp and data-efficient edge detection using these capabilities, we introduce an edge-specialized adaptation of image-generation foundation models. To better specialize the foundation model for edge detection, we incorporate an edge-oriented objective with an efficient pixel-space loss. At inference, we introduce guidance based on unconditional dynamics, enabling a single model to control the edge density through a guidance scale. Experiments on BSDS500, NYUDv2, BIPED, and CubiCasa compare against state-of-the-art methods and show consistent gains, particularly under no-post-processing crispness evaluation and with limited training data.

EasyControlEdge: A Foundation-Model Fine-Tuning for Edge Detection

TL;DR

Edge detection benefits from crispness and data efficiency, which are not fully served by prior methods. EasyControlEdge adapts a vision diffusion foundation model via lightweight Condition Injection LoRA, introduces a pixel-space loss for pixel-accurate localization, and enables inference-time edge-density control through classifier-free guidance. Across BSDS500, NYUDv2, BIPED, and CubiCasa, it delivers competitive or superior results, notably in raw-edge (CEval) performance and in low-data regimes, while offering adjustable edge density without retraining. This work demonstrates that combining foundation-model priors, targeted pixel supervision, and controllable inference yields practical, high-fidelity edge maps suitable for downstream tasks like floor-plan reconstruction and wall-boundary extraction.

Abstract

We propose EasyControlEdge, adapting an image-generation foundation model to edge detection. In real-world edge detection (e.g., floor-plan walls, satellite roads/buildings, and medical organ boundaries), crispness and data efficiency are crucial, yet producing crisp raw edge maps with limited training samples remains challenging. Although image-generation foundation models perform well on many downstream tasks, their pretrained priors for data-efficient transfer and iterative refinement for high-frequency detail preservation remain underexploited for edge detection. To enable crisp and data-efficient edge detection using these capabilities, we introduce an edge-specialized adaptation of image-generation foundation models. To better specialize the foundation model for edge detection, we incorporate an edge-oriented objective with an efficient pixel-space loss. At inference, we introduce guidance based on unconditional dynamics, enabling a single model to control the edge density through a guidance scale. Experiments on BSDS500, NYUDv2, BIPED, and CubiCasa compare against state-of-the-art methods and show consistent gains, particularly under no-post-processing crispness evaluation and with limited training data.
Paper Structure (38 sections, 13 equations, 9 figures, 4 tables)

This paper contains 38 sections, 13 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overview of EasyControlEdge. The left side shows the training flow and the right side shows the inference flow. We train only a condition-injection LoRA on a frozen DiT-based foundation model (Sec. \ref{['sec:cond_inject']}) with edge-oriented objectives (Sec. \ref{['sec:pix_loss']}), and control edge density at inference via classifier-free guidance with scale $\gamma$ (Sec. \ref{['sec:fm_guidance']}).
  • Figure 2: Qualitative results. Top row shows the results on NYUDv2 and bottom row shows the results on CubiCasa.
  • Figure 3: Qualitative comparison of different inference steps $K$ on BIPED. Increasing $K$ sharpens edges and recovers fine details.
  • Figure 4: Mean Brightness vs $\gamma$.
  • Figure 5: Qualitative effect of guidance scale $\gamma$ on BIPED.
  • ...and 4 more figures