Table of Contents
Fetching ...

MoGraphGPT: Creating Interactive Scenes Using Modular LLM and Graphical Control

Hui Ye, Chufeng Xiao, Jiaye Leng, Pengfei Xu, Hongbo Fu

TL;DR

MoGraphGPT addresses the difficulty of generating reliable, interactive 2D scene code with LLMs by introducing an element-level modularization scheme and a central orchestrator augmented by a graphical interface. It maintains independent LLM modules for each element and a central module that manages interactions, enabling precise, graphically informed prompts and automatically generated sliders for parameter control. The paper derives insights from video tutorials and AI coding tools, proposes design considerations, and demonstrates through a comparative study and an open-ended usability study that MoGraphGPT improves ease, controllability, and refinement relative to Cursor Composer. The work suggests practical impact for non-programmers creating games, animations, and demonstrations, and outlines future directions including context management and scalability.

Abstract

Creating interactive scenes often involves complex programming tasks. Although large language models (LLMs) like ChatGPT can generate code from natural language, their output is often error-prone, particularly when scripting interactions among multiple elements. The linear conversational structure limits the editing of individual elements, and lacking graphical and precise control complicates visual integration. To address these issues, we integrate an element-level modularization technique that processes textual descriptions for individual elements through separate LLM modules, with a central module managing interactions among elements. This modular approach allows for refining each element independently. We design a graphical user interface, MoGraphGPT , which combines modular LLMs with enhanced graphical control to generate codes for 2D interactive scenes. It enables direct integration of graphical information and offers quick, precise control through automatically generated sliders. Our comparative evaluation against an AI coding tool, Cursor Composer, as the baseline system and a usability study show MoGraphGPT significantly improves easiness, controllability, and refinement in creating complex 2D interactive scenes with multiple visual elements in a coding-free manner.

MoGraphGPT: Creating Interactive Scenes Using Modular LLM and Graphical Control

TL;DR

MoGraphGPT addresses the difficulty of generating reliable, interactive 2D scene code with LLMs by introducing an element-level modularization scheme and a central orchestrator augmented by a graphical interface. It maintains independent LLM modules for each element and a central module that manages interactions, enabling precise, graphically informed prompts and automatically generated sliders for parameter control. The paper derives insights from video tutorials and AI coding tools, proposes design considerations, and demonstrates through a comparative study and an open-ended usability study that MoGraphGPT improves ease, controllability, and refinement relative to Cursor Composer. The work suggests practical impact for non-programmers creating games, animations, and demonstrations, and outlines future directions including context management and scalability.

Abstract

Creating interactive scenes often involves complex programming tasks. Although large language models (LLMs) like ChatGPT can generate code from natural language, their output is often error-prone, particularly when scripting interactions among multiple elements. The linear conversational structure limits the editing of individual elements, and lacking graphical and precise control complicates visual integration. To address these issues, we integrate an element-level modularization technique that processes textual descriptions for individual elements through separate LLM modules, with a central module managing interactions among elements. This modular approach allows for refining each element independently. We design a graphical user interface, MoGraphGPT , which combines modular LLMs with enhanced graphical control to generate codes for 2D interactive scenes. It enables direct integration of graphical information and offers quick, precise control through automatically generated sliders. Our comparative evaluation against an AI coding tool, Cursor Composer, as the baseline system and a usability study show MoGraphGPT significantly improves easiness, controllability, and refinement in creating complex 2D interactive scenes with multiple visual elements in a coding-free manner.

Paper Structure

This paper contains 32 sections, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The framework of our context-aware LLM modularization technique. The central LLM module generates and maintains central code. It manages individual LLM modules to generate individual class codes. The contextual information is extracted from individual codes and input to the central LLM module for reference.
  • Figure 2: MoGraphGPT workflow. When users input text prompts for individual elements, our system integrates graphical information into prompts and sends them to individual modules to generate class codes (Top). For interactions (Bottom), prompts with the integrated graphical information go to the central LLM, which creates the central code. It then notifies individual LLM modules to update their codes with new variables and functions. Changes are reflected in real-time, and the central and individual codes together form the final result.
  • Figure 3: Four ways to create elements in our system. (a) Upload an image. (b) Draw a sketch. (c) Add a group and let LLM generate a group of elements (with a user-uploaded element image), either explicitly mentioning "group" in text prompt or not. (d) Ask LLM to generate elements.
  • Figure 4: We allow users to specify four types of graphical inputs: (a) point, (b) line, (c) curve, and (d) region. Users can refer to their names in the text prompts.
  • Figure 5: MoGraphGPT user interface. Element Pane contains the buttons and preview images for all the created elements in the scene. Canvas Area shows all the elements that can be manipulated by users directly. Once users press the "Generate" button, the result is generated or updated in the Result Area. Effect Control Pane displays the automatically generated parameter values and sliders for precise control.
  • ...and 4 more figures