Table of Contents
Fetching ...

GraphiMind: LLM-centric Interface for Information Graphics Design

Qirui Huang, Min Lu, Joel Lanir, Dani Lischinski, Daniel Cohen-Or, Hui Huang

TL;DR

This work introduces GraphiMind, an LLM-centric interface that unifies a textual conversational agent with a graphical canvas to enable end-to-end information graphics design for non-professionals. By leveraging tool-augmented LLMs, the system schedules and executes design tasks (information collection, visual element design, pivot/background figure creation, layout customization, and local adjustments) through specialized tools while allowing users to refine results on a canvas. The implementation combines Stable Diffusion XL 1.0 for image generation, InstructPix2Pix for editing, SAM for clipping, and an Iconify-based icon pipeline, orchestrated by a domain-specific DSL and GPT-4-driven layout planning. An empirical evaluation with 16 novices shows GraphiMind significantly reduces design time, especially in information collection, and enhances user experience, while also highlighting areas for future work such as personalization, global-context awareness, and richer resource recommendations.

Abstract

Information graphics are pivotal in effective information dissemination and storytelling. However, creating such graphics is extremely challenging for non-professionals, since the design process requires multifaceted skills and comprehensive knowledge. Thus, despite the many available authoring tools, a significant gap remains in enabling non-experts to produce compelling information graphics seamlessly, especially from scratch. Recent breakthroughs show that Large Language Models (LLMs), especially when tool-augmented, can autonomously engage with external tools, making them promising candidates for enabling innovative graphic design applications. In this work, we propose a LLM-centric interface with the agent GraphiMind for automatic generation, recommendation, and composition of information graphics design resources, based on user intent expressed through natural language. Our GraphiMind integrates a Textual Conversational Interface, powered by tool-augmented LLM, with a traditional Graphical Manipulation Interface, streamlining the entire design process from raw resource curation to composition and refinement. Extensive evaluations highlight our tool's proficiency in simplifying the design process, opening avenues for its use by non-professional users. Moreover, we spotlight the potential of LLMs in reshaping the domain of information graphics design, offering a blend of automation, versatility, and user-centric interactivity.

GraphiMind: LLM-centric Interface for Information Graphics Design

TL;DR

This work introduces GraphiMind, an LLM-centric interface that unifies a textual conversational agent with a graphical canvas to enable end-to-end information graphics design for non-professionals. By leveraging tool-augmented LLMs, the system schedules and executes design tasks (information collection, visual element design, pivot/background figure creation, layout customization, and local adjustments) through specialized tools while allowing users to refine results on a canvas. The implementation combines Stable Diffusion XL 1.0 for image generation, InstructPix2Pix for editing, SAM for clipping, and an Iconify-based icon pipeline, orchestrated by a domain-specific DSL and GPT-4-driven layout planning. An empirical evaluation with 16 novices shows GraphiMind significantly reduces design time, especially in information collection, and enhances user experience, while also highlighting areas for future work such as personalization, global-context awareness, and richer resource recommendations.

Abstract

Information graphics are pivotal in effective information dissemination and storytelling. However, creating such graphics is extremely challenging for non-professionals, since the design process requires multifaceted skills and comprehensive knowledge. Thus, despite the many available authoring tools, a significant gap remains in enabling non-experts to produce compelling information graphics seamlessly, especially from scratch. Recent breakthroughs show that Large Language Models (LLMs), especially when tool-augmented, can autonomously engage with external tools, making them promising candidates for enabling innovative graphic design applications. In this work, we propose a LLM-centric interface with the agent GraphiMind for automatic generation, recommendation, and composition of information graphics design resources, based on user intent expressed through natural language. Our GraphiMind integrates a Textual Conversational Interface, powered by tool-augmented LLM, with a traditional Graphical Manipulation Interface, streamlining the entire design process from raw resource curation to composition and refinement. Extensive evaluations highlight our tool's proficiency in simplifying the design process, opening avenues for its use by non-professional users. Moreover, we spotlight the potential of LLMs in reshaping the domain of information graphics design, offering a blend of automation, versatility, and user-centric interactivity.
Paper Structure (43 sections, 17 figures, 2 tables)

This paper contains 43 sections, 17 figures, 2 tables.

Figures (17)

  • Figure 1: An Example of the Design Process in GraphiMind: users effortlessly communicate their design intention with the LLM agent in natural language, by which a wide range of core design assets are generated, including pivot figures, layouts, visual elements, and more. These resources can be seamlessly incorporated into a canvas, thereby facilitating the creation of information graphics with ease.
  • Figure 2: The Interface of GraphiMind System: the system integrates a Textual Conversational Interface (on the left), enhanced by a tool-augmented LLM as an agent, with a Graphical Manipulation Interface (on the right). This integrated interface supports a range of essential design tasks in information graphics design from information collection to final adjustments.
  • Figure 3: GraphiMind Workflow: it combines conversational interactions and graphical manipulation for infographic creation. Users communicate with an intelligent agent to initiate design tasks, which are executed using tools like Stable Diffusion. The design resources gained can be manipulated on a canvas. This process allows for an iterative blend of AI-driven automated resource generation and user-driven adjustment.
  • Figure 4: The Gallery of Information Graphics Created Using GraphiMind: it demonstrates a diverse range of information designs, encompassing various topics, contents, imagery, layouts, and so on.
  • Figure 5: The Diverse Design Assets by GraphiMind in the Dialogue between Agent and User: the diversity stems from the variety in the agent's reasoning processes as well as the inherent diversity of the tools themselves.
  • ...and 12 more figures