Table of Contents
Fetching ...

LLM-Driven 3D Scene Generation of Agricultural Simulation Environments

Arafa Yoncalik, Wouter Jansen, Nico Huebel, Mohammad Hasan Rahmani, Jan Steckel

TL;DR

The paper tackles the challenge of generating realistic agricultural 3D scenes by extending beyond single-LLM approaches to a modular pipeline that splits the task into asset retrieval, domain-knowledge augmentation via retrieval-augmented generation (RAG), and code generation that drives Unreal Engine. By combining few-shot prompting, finetuning, RAG, and validation across specialized LLMs, the approach yields semantically coherent, agronomically informed scenes with demonstrable time savings over manual design. The work presents quantitative and qualitative evaluations, including asset retrieval accuracy, domain-knowledge consistency, code-generation executability, user-perceived realism, and expert timing comparisons, and compares modular vs. single-LLM baselines. Findings indicate improved reliability, scalability, and efficiency, while acknowledging limitations such as static assets and lack of dynamic growth, pointing to terrain, weather, and richer assets as avenues for future impact across domains beyond agriculture.

Abstract

Procedural generation techniques in 3D rendering engines have revolutionized the creation of complex environments, reducing reliance on manual design. Recent approaches using Large Language Models (LLMs) for 3D scene generation show promise but often lack domain-specific reasoning, verification mechanisms, and modular design. These limitations lead to reduced control and poor scalability. This paper investigates the use of LLMs to generate agricultural synthetic simulation environments from natural language prompts, specifically to address the limitations of lacking domain-specific reasoning, verification mechanisms, and modular design. A modular multi-LLM pipeline was developed, integrating 3D asset retrieval, domain knowledge injection, and code generation for the Unreal rendering engine using its API. This results in a 3D environment with realistic planting layouts and environmental context, all based on the input prompt and the domain knowledge. To enhance accuracy and scalability, the system employs a hybrid strategy combining LLM optimization techniques such as few-shot prompting, Retrieval-Augmented Generation (RAG), finetuning, and validation. Unlike monolithic models, the modular architecture enables structured data handling, intermediate verification, and flexible expansion. The system was evaluated using structured prompts and semantic accuracy metrics. A user study assessed realism and familiarity against real-world images, while an expert comparison demonstrated significant time savings over manual scene design. The results confirm the effectiveness of multi-LLM pipelines in automating domain-specific 3D scene generation with improved reliability and precision. Future work will explore expanding the asset hierarchy, incorporating real-time generation, and adapting the pipeline to other simulation domains beyond agriculture.

LLM-Driven 3D Scene Generation of Agricultural Simulation Environments

TL;DR

The paper tackles the challenge of generating realistic agricultural 3D scenes by extending beyond single-LLM approaches to a modular pipeline that splits the task into asset retrieval, domain-knowledge augmentation via retrieval-augmented generation (RAG), and code generation that drives Unreal Engine. By combining few-shot prompting, finetuning, RAG, and validation across specialized LLMs, the approach yields semantically coherent, agronomically informed scenes with demonstrable time savings over manual design. The work presents quantitative and qualitative evaluations, including asset retrieval accuracy, domain-knowledge consistency, code-generation executability, user-perceived realism, and expert timing comparisons, and compares modular vs. single-LLM baselines. Findings indicate improved reliability, scalability, and efficiency, while acknowledging limitations such as static assets and lack of dynamic growth, pointing to terrain, weather, and richer assets as avenues for future impact across domains beyond agriculture.

Abstract

Procedural generation techniques in 3D rendering engines have revolutionized the creation of complex environments, reducing reliance on manual design. Recent approaches using Large Language Models (LLMs) for 3D scene generation show promise but often lack domain-specific reasoning, verification mechanisms, and modular design. These limitations lead to reduced control and poor scalability. This paper investigates the use of LLMs to generate agricultural synthetic simulation environments from natural language prompts, specifically to address the limitations of lacking domain-specific reasoning, verification mechanisms, and modular design. A modular multi-LLM pipeline was developed, integrating 3D asset retrieval, domain knowledge injection, and code generation for the Unreal rendering engine using its API. This results in a 3D environment with realistic planting layouts and environmental context, all based on the input prompt and the domain knowledge. To enhance accuracy and scalability, the system employs a hybrid strategy combining LLM optimization techniques such as few-shot prompting, Retrieval-Augmented Generation (RAG), finetuning, and validation. Unlike monolithic models, the modular architecture enables structured data handling, intermediate verification, and flexible expansion. The system was evaluated using structured prompts and semantic accuracy metrics. A user study assessed realism and familiarity against real-world images, while an expert comparison demonstrated significant time savings over manual scene design. The results confirm the effectiveness of multi-LLM pipelines in automating domain-specific 3D scene generation with improved reliability and precision. Future work will explore expanding the asset hierarchy, incorporating real-time generation, and adapting the pipeline to other simulation domains beyond agriculture.
Paper Structure (21 sections, 3 figures, 6 tables)

This paper contains 21 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: An example prompt-to-scene generation. The interface takes a natural language prompt and outputs a procedurally generated agricultural environment in Unreal Engine.
  • Figure 2: High-level architecture of the multi-LLM system.
  • Figure 3: Comparison of generation time: System vs Expert (Single-Field Scenes).