Oitijjo-3D: Generative AI Framework for Rapid 3D Heritage Reconstruction from Street View Imagery
Momen Khandoker Ope, Akif Islam, Mohd Ruhul Ameen, Abu Saleh Musa Miah, Md Rashedul Islam, Jungpil Shin
TL;DR
The paper tackles the shortage of affordable, open-access 3D heritage documentation in resource-limited contexts like Bangladesh. It introduces Oitijjo-3D, a generative AI framework that converts publicly available Street View imagery into 3D heritage models via a two-stage pipeline that combines Gemini 2.5 Flash Image for isometric 2D synthesis and Hexagen for neural image-to-3D geometry. The approach delivers photorealistic, metrically coherent models within seconds on commodity hardware, delivering substantial speedups over traditional SfM pipelines. It promotes a community-driven, AI-assisted model of cultural preservation and outlines a path toward fully open-source, locally hosted implementations to enhance privacy, cost, and scalability.
Abstract
Cultural heritage restoration in Bangladesh faces a dual challenge of limited resources and scarce technical expertise. Traditional 3D digitization methods, such as photogrammetry or LiDAR scanning, require expensive hardware, expert operators, and extensive on-site access, which are often infeasible in developing contexts. As a result, many of Bangladesh's architectural treasures, from the Paharpur Buddhist Monastery to Ahsan Manzil, remain vulnerable to decay and inaccessible in digital form. This paper introduces Oitijjo-3D, a cost-free generative AI framework that democratizes 3D cultural preservation. By using publicly available Google Street View imagery, Oitijjo-3D reconstructs faithful 3D models of heritage structures through a two-stage pipeline - multimodal visual reasoning with Gemini 2.5 Flash Image for structure-texture synthesis, and neural image-to-3D generation through Hexagen for geometry recovery. The system produces photorealistic, metrically coherent reconstructions in seconds, achieving significant speedups compared to conventional Structure-from-Motion pipelines, without requiring any specialized hardware or expert supervision. Experiments on landmarks such as Ahsan Manzil, Choto Sona Mosque, and Paharpur demonstrate that Oitijjo-3D preserves both visual and structural fidelity while drastically lowering economic and technical barriers. By turning open imagery into digital heritage, this work reframes preservation as a community-driven, AI-assisted act of cultural continuity for resource-limited nations.
