Syllables to Scenes: Literary-Guided Free-Viewpoint 3D Scene Synthesis from Japanese Haiku
Chunan Yu, Yidong Han, Chaotao Ding, Ying Zang, Lanyun Zhu, Xinhao Chen, Zejian Li, Renjun Xu, Tianrun Chen
TL;DR
This paper addresses the problem of translating classical Haiku into explorable 3D scenes while preserving semantic and emotional fidelity. It introduces HaikuVerse, a two-stage framework consisting of Hierarchical Literary-Criticism Theory Grounded Parsing (H-LCTGP) and Progressive Dimensional Synthesis (PDS), which converts poetry into structured prompts and then into 3D scenes via panoramic diffusion and 3D Gaussian Splatting. The method leverages LLM-assisted parsing with a three-stage process and a multi-stage diffusion-to-3D pipeline that includes depth estimation and real-time enhancement, yielding 360-degree navigable scenes. It outperforms existing text-to-3D baselines on literary fidelity and visual quality, with implications for cultural heritage visualization in AR/VR environments.
Abstract
In the era of the metaverse, where immersive technologies redefine human experiences, translating abstract literary concepts into navigable 3D environments presents a fundamental challenge in preserving semantic and emotional fidelity. This research introduces HaikuVerse, a novel framework for transforming poetic abstraction into spatial representation, with Japanese Haiku serving as an ideal test case due to its sophisticated encapsulation of profound emotions and imagery within minimal text. While existing text-to-3D methods struggle with nuanced interpretations, we present a literary-guided approach that synergizes traditional poetry analysis with advanced generative technologies. Our framework centers on two key innovations: (1) Hierarchical Literary-Criticism Theory Grounded Parsing (H-LCTGP), which captures both explicit imagery and implicit emotional resonance through structured semantic decomposition, and (2) Progressive Dimensional Synthesis (PDS), a multi-stage pipeline that systematically transforms poetic elements into coherent 3D scenes through sequential diffusion processes, geometric optimization, and real-time enhancement. Extensive experiments demonstrate that HaikuVerse significantly outperforms conventional text-to-3D approaches in both literary fidelity and visual quality, establishing a new paradigm for preserving cultural heritage in immersive digital spaces. Project website at: https://syllables-to-scenes.github.io/
