DesignPref: Capturing Personal Preferences in Visual Design Generation

Yi-Hao Peng; Jeffrey P. Bigham; Jason Wu

DesignPref: Capturing Personal Preferences in Visual Design Generation

Yi-Hao Peng, Jeffrey P. Bigham, Jason Wu

TL;DR

This work tackles the problem of subjectivity in visual UI design evaluation by introducing DesignPref, a dataset of 12,000 designer-authored pairwise comparisons with multi-level ratings and rationales. It demonstrates substantial inter-designer disagreement and analyzes the reasons behind divergent judgments, enabling identity-aware modeling. Through CLIP finetuning with a strength-aware margin and retrieval-augmented generation, the authors show that personalized models outperform aggregated baselines, achieving strong performance with far fewer personalized examples. The findings suggest that encoding designer identity and preference strength can significantly improve automated UI design assessment and pave the way for personalized design generation and evaluation workflows with practical impact for designers and developers.

Abstract

Generative models, such as large language models and text-to-image diffusion models, are increasingly used to create visual designs like user interfaces (UIs) and presentation slides. Finetuning and benchmarking these generative models have often relied on datasets of human-annotated design preferences. Yet, due to the subjective and highly personalized nature of visual design, preference varies widely among individuals. In this paper, we study this problem by introducing DesignPref, a dataset of 12k pairwise comparisons of UI design generation annotated by 20 professional designers with multi-level preference ratings. We found that among trained designers, substantial levels of disagreement exist (Krippendorff's alpha = 0.25 for binary preferences). Natural language rationales provided by these designers indicate that disagreements stem from differing perceptions of various design aspect importance and individual preferences. With DesignPref, we demonstrate that traditional majority-voting methods for training aggregated judge models often do not accurately reflect individual preferences. To address this challenge, we investigate multiple personalization strategies, particularly fine-tuning or incorporating designer-specific annotations into RAG pipelines. Our results show that personalized models consistently outperform aggregated baseline models in predicting individual designers' preferences, even when using 20 times fewer examples. Our work provides the first dataset to study personalized visual design evaluation and support future research into modeling individual design taste.

DesignPref: Capturing Personal Preferences in Visual Design Generation

TL;DR

Abstract

DesignPref: Capturing Personal Preferences in Visual Design Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)