Table of Contents
Fetching ...

Syllables to Scenes: Literary-Guided Free-Viewpoint 3D Scene Synthesis from Japanese Haiku

Chunan Yu, Yidong Han, Chaotao Ding, Ying Zang, Lanyun Zhu, Xinhao Chen, Zejian Li, Renjun Xu, Tianrun Chen

TL;DR

This paper addresses the problem of translating classical Haiku into explorable 3D scenes while preserving semantic and emotional fidelity. It introduces HaikuVerse, a two-stage framework consisting of Hierarchical Literary-Criticism Theory Grounded Parsing (H-LCTGP) and Progressive Dimensional Synthesis (PDS), which converts poetry into structured prompts and then into 3D scenes via panoramic diffusion and 3D Gaussian Splatting. The method leverages LLM-assisted parsing with a three-stage process and a multi-stage diffusion-to-3D pipeline that includes depth estimation and real-time enhancement, yielding 360-degree navigable scenes. It outperforms existing text-to-3D baselines on literary fidelity and visual quality, with implications for cultural heritage visualization in AR/VR environments.

Abstract

In the era of the metaverse, where immersive technologies redefine human experiences, translating abstract literary concepts into navigable 3D environments presents a fundamental challenge in preserving semantic and emotional fidelity. This research introduces HaikuVerse, a novel framework for transforming poetic abstraction into spatial representation, with Japanese Haiku serving as an ideal test case due to its sophisticated encapsulation of profound emotions and imagery within minimal text. While existing text-to-3D methods struggle with nuanced interpretations, we present a literary-guided approach that synergizes traditional poetry analysis with advanced generative technologies. Our framework centers on two key innovations: (1) Hierarchical Literary-Criticism Theory Grounded Parsing (H-LCTGP), which captures both explicit imagery and implicit emotional resonance through structured semantic decomposition, and (2) Progressive Dimensional Synthesis (PDS), a multi-stage pipeline that systematically transforms poetic elements into coherent 3D scenes through sequential diffusion processes, geometric optimization, and real-time enhancement. Extensive experiments demonstrate that HaikuVerse significantly outperforms conventional text-to-3D approaches in both literary fidelity and visual quality, establishing a new paradigm for preserving cultural heritage in immersive digital spaces. Project website at: https://syllables-to-scenes.github.io/

Syllables to Scenes: Literary-Guided Free-Viewpoint 3D Scene Synthesis from Japanese Haiku

TL;DR

This paper addresses the problem of translating classical Haiku into explorable 3D scenes while preserving semantic and emotional fidelity. It introduces HaikuVerse, a two-stage framework consisting of Hierarchical Literary-Criticism Theory Grounded Parsing (H-LCTGP) and Progressive Dimensional Synthesis (PDS), which converts poetry into structured prompts and then into 3D scenes via panoramic diffusion and 3D Gaussian Splatting. The method leverages LLM-assisted parsing with a three-stage process and a multi-stage diffusion-to-3D pipeline that includes depth estimation and real-time enhancement, yielding 360-degree navigable scenes. It outperforms existing text-to-3D baselines on literary fidelity and visual quality, with implications for cultural heritage visualization in AR/VR environments.

Abstract

In the era of the metaverse, where immersive technologies redefine human experiences, translating abstract literary concepts into navigable 3D environments presents a fundamental challenge in preserving semantic and emotional fidelity. This research introduces HaikuVerse, a novel framework for transforming poetic abstraction into spatial representation, with Japanese Haiku serving as an ideal test case due to its sophisticated encapsulation of profound emotions and imagery within minimal text. While existing text-to-3D methods struggle with nuanced interpretations, we present a literary-guided approach that synergizes traditional poetry analysis with advanced generative technologies. Our framework centers on two key innovations: (1) Hierarchical Literary-Criticism Theory Grounded Parsing (H-LCTGP), which captures both explicit imagery and implicit emotional resonance through structured semantic decomposition, and (2) Progressive Dimensional Synthesis (PDS), a multi-stage pipeline that systematically transforms poetic elements into coherent 3D scenes through sequential diffusion processes, geometric optimization, and real-time enhancement. Extensive experiments demonstrate that HaikuVerse significantly outperforms conventional text-to-3D approaches in both literary fidelity and visual quality, establishing a new paradigm for preserving cultural heritage in immersive digital spaces. Project website at: https://syllables-to-scenes.github.io/

Paper Structure

This paper contains 7 sections, 2 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overall Architecture. Our objective is to transform classical Japanese Haiku into 3D scenes through two stages: H-LCTGP: (1) Haiku parsing using large language models (LLMs); PDS: (2) Text-to-Image Generation with Relay Diffusion; (3) Paranomic Image Generation with Panorama Diffusion; (5) Depth Map Generation with Depth Diffusion; (5) Geometric Optimization with 3D Gaussian Splatting; (6) Real-time Image Enhancement for immersive, navigable 3D scene visualization.
  • Figure 2: Haiku Enhancement integrating traditional literary analysis principles. An example of the text enhancement process for Haiku using LLMs.
  • Figure 3: Impact of Key Elements Parsing on Panoramic Image Generation.
  • Figure 4: Visualization Result. Our method is capable of generating high-quality and continuous 3D scenes. More result in Supp. Material.
  • Figure 5: Comparison with baseline. Our method achieves superior performance in terms of content consistency and scene continuity.
  • ...and 1 more figures