SketchRef: a Multi-Task Evaluation Benchmark for Sketch Synthesis
Xingyue Lin, Xingjian Hu, Shuai Peng, Jianhua Zhu, Liangcai Gao
TL;DR
SketchRef addresses the lack of standardized evaluation for sketch synthesis by introducing a unified, multi-task benchmark that leverages shared structure between sketches and reference photos. It defines two tasks—category prediction and structural consistency estimation—across four domains and introduces the mean recognizability under simplification ($mRS$) to balance recognizability with simplicity. A pose-alignment-based $R_s$ metric based on keypoint correspondences via OKS and a relative simplicity measure SR enable fair, cross-method comparison. Evaluations on eight sketch-synthesis methods show that strong category recognizability does not imply structural fidelity and highlight the need for structure-preserving training. The benchmark is validated with $7{,}920$ human responses and provides a practical framework to advance sketch synthesis research.
Abstract
Sketching is a powerful artistic technique for capturing essential visual information about real-world objects and has increasingly attracted attention in image synthesis research. However, the field lacks a unified benchmark to evaluate the performance of various synthesis methods. To address this, we propose SketchRef, the first comprehensive multi-task evaluation benchmark for sketch synthesis. SketchRef fully leverages the shared characteristics between sketches and reference photos. It introduces two primary tasks: category prediction and structural consistency estimation, the latter being largely overlooked in previous studies. These tasks are further divided into five sub-tasks across four domains: animals, common things, human body, and faces. Recognizing the inherent trade-off between recognizability and simplicity in sketches, we are the first to quantify this balance by introducing a recognizability calculation method constrained by simplicity, mRS, ensuring fair and meaningful evaluations. To validate our approach, we collected 7,920 responses from art enthusiasts, confirming the effectiveness of our proposed evaluation metrics. Additionally, we evaluate the performance of existing sketch synthesis methods on our benchmark, highlighting their strengths and weaknesses. We hope this study establishes a standardized benchmark and offers valuable insights for advancing sketch synthesis algorithms.
