Table of Contents
Fetching ...

MapExplorer: New Content Generation from Low-Dimensional Visualizations

Xingjian Zhang, Ziyang Xiong, Shixuan Liu, Yutong Xie, Tolga Ergen, Dongsub Shim, Hua Xu, Honglak Lee, Qiaozhu Me

TL;DR

MapExplorer defines a novel task that generates text conditioned on coordinates in a 2D projection map, enabling exploration of unknown regions in data spaces. It introduces Atometric, an entailment-based, atomic-statement metric with multiple strictness levels to assess generated content against references, addressing limitations of n-gram and embedding-based metrics. The study compares three design families—Direct Mapping, Intermediate High-Dimensional Embedding, and Two-Stage Text Generation—across five diverse visualization datasets, revealing strengths of retrieval-augmented prompts and embedding inversion in different contexts. The work demonstrates MapExplorer’s potential for scientific idea generation and LLM red-teaming, and provides an online demo, while outlining limitations and future directions for more capable and generalized methods.

Abstract

Low-dimensional visualizations, or "projection maps," are widely used in scientific and creative domains to interpret large-scale and complex datasets. These visualizations not only aid in understanding existing knowledge spaces but also implicitly guide exploration into unknown areas. Although techniques such as t-SNE and UMAP can generate these maps, there exists no systematic method for leveraging them to generate new content. To address this, we introduce MapExplorer, a novel knowledge discovery task that translates coordinates within any projection map into coherent, contextually aligned textual content. This allows users to interactively explore and uncover insights embedded in the maps. To evaluate the performance of MapExplorer methods, we propose Atometric, a fine-grained metric inspired by ROUGE that quantifies logical coherence and alignment between generated and reference text. Experiments on diverse datasets demonstrate the versatility of MapExplorer in generating scientific hypotheses, crafting synthetic personas, and devising strategies for attacking large language models-even with simple baseline methods. By bridging visualization and generation, our work highlights the potential of MapExplorer to enable intuitive human-AI collaboration in large-scale data exploration.

MapExplorer: New Content Generation from Low-Dimensional Visualizations

TL;DR

MapExplorer defines a novel task that generates text conditioned on coordinates in a 2D projection map, enabling exploration of unknown regions in data spaces. It introduces Atometric, an entailment-based, atomic-statement metric with multiple strictness levels to assess generated content against references, addressing limitations of n-gram and embedding-based metrics. The study compares three design families—Direct Mapping, Intermediate High-Dimensional Embedding, and Two-Stage Text Generation—across five diverse visualization datasets, revealing strengths of retrieval-augmented prompts and embedding inversion in different contexts. The work demonstrates MapExplorer’s potential for scientific idea generation and LLM red-teaming, and provides an online demo, while outlining limitations and future directions for more capable and generalized methods.

Abstract

Low-dimensional visualizations, or "projection maps," are widely used in scientific and creative domains to interpret large-scale and complex datasets. These visualizations not only aid in understanding existing knowledge spaces but also implicitly guide exploration into unknown areas. Although techniques such as t-SNE and UMAP can generate these maps, there exists no systematic method for leveraging them to generate new content. To address this, we introduce MapExplorer, a novel knowledge discovery task that translates coordinates within any projection map into coherent, contextually aligned textual content. This allows users to interactively explore and uncover insights embedded in the maps. To evaluate the performance of MapExplorer methods, we propose Atometric, a fine-grained metric inspired by ROUGE that quantifies logical coherence and alignment between generated and reference text. Experiments on diverse datasets demonstrate the versatility of MapExplorer in generating scientific hypotheses, crafting synthetic personas, and devising strategies for attacking large language models-even with simple baseline methods. By bridging visualization and generation, our work highlights the potential of MapExplorer to enable intuitive human-AI collaboration in large-scale data exploration.

Paper Structure

This paper contains 65 sections, 1 equation, 13 figures, 7 tables.

Figures (13)

  • Figure 1: An illustrative example of Atometric for evaluating red-teaming strategies generated for LLMs. Step ①: Atometric breaks down the generated text into a set of atomic statements. Step ②: Each statement is individually compared against the reference text to assess its level of support under varying strictness thresholds, providing a measure of "precision." Step ③ & ④: Conversely, the reference text can be decomposed and compared against the generated text to evaluate "recall." Both decomposition and verification are automated using LLMs with a structured prompt template, detailed in Appendix \ref{['sec:atometric_detail']}.
  • Figure 2: MapExplorer demo using the Red-Teaming Strategies dataset. In this screenshot, the user selects Option 1 to specify custom coordinates for generating new content.
  • Figure 3: MapExplorer demo using the Persona dataset. In this screenshot, the user selects Option 2 to use a randomly provided coordinate for generating new content.
  • Figure 4: Sample screenshot of human annotation protocol.
  • Figure 5: Prompt template for Atometric decomposition.
  • ...and 8 more figures