WebRPG: Automatic Web Rendering Parameters Generation for Visual Presentation
Zirui Shao, Feiyu Gao, Hangdi Xing, Zepeng Zhu, Zhi Yu, Jiajun Bu, Qi Zheng, Cong Yao
TL;DR
WebRPG addresses the challenge of generating coherent web visual presentations directly from HTML by defining rendering parameters (RPs) that standardize CSS properties. The approach uses a latent generation framework where a VAE compresses RPs per element and HTML embeddings (semantic via MarkupLM, hierarchical via XPath, and character-count) guide autoregressive or diffusion decoders to produce the RPs, which are then decoded back to rendering CSS. A new Klarna-derived dataset of 88,418 sub-pages supports offline rendering and robust evaluation with metrics including Fréchet Inception Distance, Element IoU, and a novel Style Consistency Score; experiments show autoregressive WebRPG-AR generally outperforms diffusion-based WebRPG-DM, with GPT-4 providing additional qualitative gains. The work demonstrates the feasibility of automated web design workflows from HTML, highlights important design-knowledge transfer via HTML embeddings, and lays groundwork for future integration with large language models and CSS frameworks to broaden applicability and practicality.
Abstract
In the era of content creation revolution propelled by advancements in generative models, the field of web design remains unexplored despite its critical role in modern digital communication. The web design process is complex and often time-consuming, especially for those with limited expertise. In this paper, we introduce Web Rendering Parameters Generation (WebRPG), a new task that aims at automating the generation for visual presentation of web pages based on their HTML code. WebRPG would contribute to a faster web development workflow. Since there is no existing benchmark available, we develop a new dataset for WebRPG through an automated pipeline. Moreover, we present baseline models, utilizing VAE to manage numerous elements and rendering parameters, along with custom HTML embedding for capturing essential semantic and hierarchical information from HTML. Extensive experiments, including customized quantitative evaluations for this specific task, are conducted to evaluate the quality of the generated results.
