Table of Contents
Fetching ...

Sketch2code: Generating a website from a paper mockup

Alex Robinson

TL;DR

This work tackles translating hand-drawn wireframes into functional website code, introducing two translation strategies: classical computer vision and deep semantic segmentation. It contributes an end-to-end framework, a dataset built from Bootstrap templates, and an evaluation framework using synthetic sketches to compare approaches. Empirical results show deep learning yields superior element detection and visual/structural similarity, though generalisation to unseen sketches remains challenging and domain biases persist. The study demonstrates the feasibility of sketch-to-code pipelines and provides a foundation for future, production-ready design-to-code systems. The released dataset, framework, and evaluation methodology offer a valuable resource for researchers pursuing automated UI generation from sketches.

Abstract

An early stage of developing user-facing applications is creating a wireframe to layout the interface. Once a wireframe has been created it is given to a developer to implement in code. Developing boiler plate user interface code is time consuming work but still requires an experienced developer. In this dissertation we present two approaches which automates this process, one using classical computer vision techniques, and another using a novel application of deep semantic segmentation networks. We release a dataset of websites which can be used to train and evaluate these approaches. Further, we have designed a novel evaluation framework which allows empirical evaluation by creating synthetic sketches. Our evaluation illustrates that our deep learning approach outperforms our classical computer vision approach and we conclude that deep learning is the most promising direction for future research.

Sketch2code: Generating a website from a paper mockup

TL;DR

This work tackles translating hand-drawn wireframes into functional website code, introducing two translation strategies: classical computer vision and deep semantic segmentation. It contributes an end-to-end framework, a dataset built from Bootstrap templates, and an evaluation framework using synthetic sketches to compare approaches. Empirical results show deep learning yields superior element detection and visual/structural similarity, though generalisation to unseen sketches remains challenging and domain biases persist. The study demonstrates the feasibility of sketch-to-code pipelines and provides a foundation for future, production-ready design-to-code systems. The released dataset, framework, and evaluation methodology offer a valuable resource for researchers pursuing automated UI generation from sketches.

Abstract

An early stage of developing user-facing applications is creating a wireframe to layout the interface. Once a wireframe has been created it is given to a developer to implement in code. Developing boiler plate user interface code is time consuming work but still requires an experienced developer. In this dissertation we present two approaches which automates this process, one using classical computer vision techniques, and another using a novel application of deep semantic segmentation networks. We release a dataset of websites which can be used to train and evaluate these approaches. Further, we have designed a novel evaluation framework which allows empirical evaluation by creating synthetic sketches. Our evaluation illustrates that our deep learning approach outperforms our classical computer vision approach and we conclude that deep learning is the most promising direction for future research.

Paper Structure

This paper contains 67 sections, 7 equations, 37 figures, 6 tables, 1 algorithm.

Figures (37)

  • Figure 1: Examples of sketched wireframes for mobile and desktop applications. Notice that there are slight differences in styles but there are common symbols such as using horizontal lines to represent text.
  • Figure 2: Examples from each of the five elements we use to represent title, image, button, input, and paragraph elements in our wireframes. These symbols are based off popular and commonly understood wireframe elements.
  • Figure 3: An example website with HTML and CSS showing three levels of abstraction (a), (b), and (c). (a) shows how HTML consists of nested elements which describe the structure of elements. b) shows how the HTML represents a GUI tree with leaf elements which contain content and branch (container) elements which group elements. (c) shows how the HTML is interpreted by a rendering engine to produce a visual result.
  • Figure 4: Semantic segmentation example. Source Everingham10
  • Figure 5: Dilated/Atrous Convolutions. Source dumoulin2016guide
  • ...and 32 more figures