Table of Contents
Fetching ...

Beauty in the Eye of AI: Aligning LLMs and Vision Models with Human Aesthetics in Network Visualization

Peng Zhang, Xuefeng Li, Xiaoqi Wang, Han-Wei Shen, Yifan Hu

Abstract

Network visualization has traditionally relied on heuristic metrics, such as stress, under the assumption that optimizing them leads to aesthetic and informative layouts. However, no single metric consistently produces the most effective results. A data-driven alternative is to learn from human preferences, where annotators select their favored visualization among multiple layouts of the same graphs. These human-preference labels can then be used to train a generative model that approximates human aesthetic preferences. However, obtaining human labels at scale is costly and time-consuming. As a result, this generative approach has so far been tested only with machine-labeled data. In this paper, we explore the use of large language models (LLMs) and vision models (VMs) as proxies for human judgment. Through a carefully designed user study involving 27 participants, we curated a large set of human preference labels. We used this data both to better understand human preferences and to bootstrap LLM/VM labelers. We show that prompt engineering that combines few-shot examples and diverse input formats, such as image embeddings, significantly improves LLM-human alignment, and additional filtering by the confidence score of the LLM pushes the alignment to human-human levels. Furthermore, we demonstrate that carefully trained VMs can achieve VM-human alignment at a level comparable to that between human annotators. Our results suggest that AI can feasibly serve as a scalable proxy for human labelers.

Beauty in the Eye of AI: Aligning LLMs and Vision Models with Human Aesthetics in Network Visualization

Abstract

Network visualization has traditionally relied on heuristic metrics, such as stress, under the assumption that optimizing them leads to aesthetic and informative layouts. However, no single metric consistently produces the most effective results. A data-driven alternative is to learn from human preferences, where annotators select their favored visualization among multiple layouts of the same graphs. These human-preference labels can then be used to train a generative model that approximates human aesthetic preferences. However, obtaining human labels at scale is costly and time-consuming. As a result, this generative approach has so far been tested only with machine-labeled data. In this paper, we explore the use of large language models (LLMs) and vision models (VMs) as proxies for human judgment. Through a carefully designed user study involving 27 participants, we curated a large set of human preference labels. We used this data both to better understand human preferences and to bootstrap LLM/VM labelers. We show that prompt engineering that combines few-shot examples and diverse input formats, such as image embeddings, significantly improves LLM-human alignment, and additional filtering by the confidence score of the LLM pushes the alignment to human-human levels. Furthermore, we demonstrate that carefully trained VMs can achieve VM-human alignment at a level comparable to that between human annotators. Our results suggest that AI can feasibly serve as a scalable proxy for human labelers.

Paper Structure

This paper contains 42 sections, 10 equations, 12 figures, 14 tables.

Figures (12)

  • Figure 1: A UI showing eight graph visualizations. Of the users ($n=43$), 56% preferred the starred option, and 16% chose the bottom-left.
  • Figure 2: Demonstration of memory bank vs graph representation. The green route shows the graph representation, while the blue route shows the image-embedding–based memory bank approach.
  • Figure 3: Left: Human-human and LLM-human alignment as a function of VM confidence. Right: Human-human and VM-human alignment as a function of VM confidence. The number of test samples remaining after threadholding is given at the top of each figure.
  • Figure 4: Human-human, LLM-human and VM-human alignment visualization. Here we pick the seven most prolific human labelers.
  • Figure 5: Examples illustrating agreement and disagreement between LLM and human preferences.
  • ...and 7 more figures