EasyControlEdge: A Foundation-Model Fine-Tuning for Edge Detection
Hiroki Nakamura, Hiroto Iino, Masashi Okada, Tadahiro Taniguchi
TL;DR
Edge detection benefits from crispness and data efficiency, which are not fully served by prior methods. EasyControlEdge adapts a vision diffusion foundation model via lightweight Condition Injection LoRA, introduces a pixel-space loss for pixel-accurate localization, and enables inference-time edge-density control through classifier-free guidance. Across BSDS500, NYUDv2, BIPED, and CubiCasa, it delivers competitive or superior results, notably in raw-edge (CEval) performance and in low-data regimes, while offering adjustable edge density without retraining. This work demonstrates that combining foundation-model priors, targeted pixel supervision, and controllable inference yields practical, high-fidelity edge maps suitable for downstream tasks like floor-plan reconstruction and wall-boundary extraction.
Abstract
We propose EasyControlEdge, adapting an image-generation foundation model to edge detection. In real-world edge detection (e.g., floor-plan walls, satellite roads/buildings, and medical organ boundaries), crispness and data efficiency are crucial, yet producing crisp raw edge maps with limited training samples remains challenging. Although image-generation foundation models perform well on many downstream tasks, their pretrained priors for data-efficient transfer and iterative refinement for high-frequency detail preservation remain underexploited for edge detection. To enable crisp and data-efficient edge detection using these capabilities, we introduce an edge-specialized adaptation of image-generation foundation models. To better specialize the foundation model for edge detection, we incorporate an edge-oriented objective with an efficient pixel-space loss. At inference, we introduce guidance based on unconditional dynamics, enabling a single model to control the edge density through a guidance scale. Experiments on BSDS500, NYUDv2, BIPED, and CubiCasa compare against state-of-the-art methods and show consistent gains, particularly under no-post-processing crispness evaluation and with limited training data.
