Table of Contents
Fetching ...

SEA: Evaluating Sketch Abstraction Efficiency via Element-level Commonsense Visual Question Answering

Jiho Park, Sieun Choi, Jaeyoon Seo, Minho Sohn, Yeana Kim, Jihie Kim

Abstract

A sketch is a distilled form of visual abstraction that conveys core concepts through simplified yet purposeful strokes while omitting extraneous detail. Despite its expressive power, quantifying the efficiency of semantic abstraction in sketches remains challenging. Existing evaluation methods that rely on reference images, low-level visual features, or recognition accuracy do not capture abstraction, the defining property of sketches. To address these limitations, we introduce SEA (Sketch Evaluation metric for Abstraction efficiency), a reference-free metric that assesses how economically a sketch represents class-defining visual elements while preserving semantic recognizability. These elements are derived per class from commonsense knowledge about features typically depicted in sketches. SEA leverages a visual question answering model to determine the presence of each element and returns a quantitative score that reflects semantic retention under visual economy. To support this metric, we present CommonSketch, the first semantically annotated sketch dataset, comprising 23,100 human-drawn sketches across 300 classes, each paired with a caption and element-level annotations. Experiments show that SEA aligns closely with human judgments and reliably discriminates levels of abstraction efficiency, while CommonSketch serves as a benchmark providing systematic evaluation of element-level sketch understanding across various vision-language models.

SEA: Evaluating Sketch Abstraction Efficiency via Element-level Commonsense Visual Question Answering

Abstract

A sketch is a distilled form of visual abstraction that conveys core concepts through simplified yet purposeful strokes while omitting extraneous detail. Despite its expressive power, quantifying the efficiency of semantic abstraction in sketches remains challenging. Existing evaluation methods that rely on reference images, low-level visual features, or recognition accuracy do not capture abstraction, the defining property of sketches. To address these limitations, we introduce SEA (Sketch Evaluation metric for Abstraction efficiency), a reference-free metric that assesses how economically a sketch represents class-defining visual elements while preserving semantic recognizability. These elements are derived per class from commonsense knowledge about features typically depicted in sketches. SEA leverages a visual question answering model to determine the presence of each element and returns a quantitative score that reflects semantic retention under visual economy. To support this metric, we present CommonSketch, the first semantically annotated sketch dataset, comprising 23,100 human-drawn sketches across 300 classes, each paired with a caption and element-level annotations. Experiments show that SEA aligns closely with human judgments and reliably discriminates levels of abstraction efficiency, while CommonSketch serves as a benchmark providing systematic evaluation of element-level sketch understanding across various vision-language models.

Paper Structure

This paper contains 49 sections, 25 equations, 24 figures, 10 tables.

Figures (24)

  • Figure 1: Overview of SEA and CommonSketch. Left: SEA quantifies abstraction efficiency by balancing recognizability and detail. High scores (top-left) favor simple yet identifiable sketches, while low scores (bottom-right) denote ambiguity or over-detail. Right: CommonSketch includes element-level annotations and captions, enabling element-aware evaluation of sketch abstraction.
  • Figure 2: CommonSketch Overview. 23,100 human-drawn sketches with paired captions and element-level commonsense across 300 classes in 14 categories; (a) construction/annotation pipeline, (b) category-wise element statistics, (c) class distribution with an example.
  • Figure 3: Computation pipeline and case-based interpretation of the SEA metric. Given a sketch and its class label, SEA combines class recognizability $P$ from a classifier with the commonsense element space $E$ extracted by an LLM and the number of visually grounded elements $V$ identified by a VLM. It then computes a reward--penalty balance and maps it to a bounded score $SEA \in (-1,1)$, where higher scores indicate sketches that preserve recognizability with minimal yet sufficient visual detail. Illustrative cases show abstraction failure due to low recognizability (left), incomplete abstraction caused by excessive detail (middle), and abstraction-efficient sketching that achieves high recognizability with fewer expressed elements (right).
  • Figure 4: Qualitative comparison of SEA scores across abstraction levels (4, 8, 16, 32) on four classes shared by SEVA and CommonSketch—Baseball (top left), Hat (bottom left), Giraffe (top right), and Guitar (bottom right). Each example reports the SEA score with visual ratio $v$, and prediction probability $P$, showing how abstraction can improve efficiency when recognizability is preserved.
  • Figure 5: Distribution of SEA scores across abstraction levels in SEVA. Sketches at lower abstraction levels (4, 8) are concentrated near low SEA scores, whereas those at higher abstraction levels (16, 32) exhibit a noticeable shift toward higher SEA scores.
  • ...and 19 more figures