Table of Contents
Fetching ...

Oitijjo-3D: Generative AI Framework for Rapid 3D Heritage Reconstruction from Street View Imagery

Momen Khandoker Ope, Akif Islam, Mohd Ruhul Ameen, Abu Saleh Musa Miah, Md Rashedul Islam, Jungpil Shin

TL;DR

The paper tackles the shortage of affordable, open-access 3D heritage documentation in resource-limited contexts like Bangladesh. It introduces Oitijjo-3D, a generative AI framework that converts publicly available Street View imagery into 3D heritage models via a two-stage pipeline that combines Gemini 2.5 Flash Image for isometric 2D synthesis and Hexagen for neural image-to-3D geometry. The approach delivers photorealistic, metrically coherent models within seconds on commodity hardware, delivering substantial speedups over traditional SfM pipelines. It promotes a community-driven, AI-assisted model of cultural preservation and outlines a path toward fully open-source, locally hosted implementations to enhance privacy, cost, and scalability.

Abstract

Cultural heritage restoration in Bangladesh faces a dual challenge of limited resources and scarce technical expertise. Traditional 3D digitization methods, such as photogrammetry or LiDAR scanning, require expensive hardware, expert operators, and extensive on-site access, which are often infeasible in developing contexts. As a result, many of Bangladesh's architectural treasures, from the Paharpur Buddhist Monastery to Ahsan Manzil, remain vulnerable to decay and inaccessible in digital form. This paper introduces Oitijjo-3D, a cost-free generative AI framework that democratizes 3D cultural preservation. By using publicly available Google Street View imagery, Oitijjo-3D reconstructs faithful 3D models of heritage structures through a two-stage pipeline - multimodal visual reasoning with Gemini 2.5 Flash Image for structure-texture synthesis, and neural image-to-3D generation through Hexagen for geometry recovery. The system produces photorealistic, metrically coherent reconstructions in seconds, achieving significant speedups compared to conventional Structure-from-Motion pipelines, without requiring any specialized hardware or expert supervision. Experiments on landmarks such as Ahsan Manzil, Choto Sona Mosque, and Paharpur demonstrate that Oitijjo-3D preserves both visual and structural fidelity while drastically lowering economic and technical barriers. By turning open imagery into digital heritage, this work reframes preservation as a community-driven, AI-assisted act of cultural continuity for resource-limited nations.

Oitijjo-3D: Generative AI Framework for Rapid 3D Heritage Reconstruction from Street View Imagery

TL;DR

The paper tackles the shortage of affordable, open-access 3D heritage documentation in resource-limited contexts like Bangladesh. It introduces Oitijjo-3D, a generative AI framework that converts publicly available Street View imagery into 3D heritage models via a two-stage pipeline that combines Gemini 2.5 Flash Image for isometric 2D synthesis and Hexagen for neural image-to-3D geometry. The approach delivers photorealistic, metrically coherent models within seconds on commodity hardware, delivering substantial speedups over traditional SfM pipelines. It promotes a community-driven, AI-assisted model of cultural preservation and outlines a path toward fully open-source, locally hosted implementations to enhance privacy, cost, and scalability.

Abstract

Cultural heritage restoration in Bangladesh faces a dual challenge of limited resources and scarce technical expertise. Traditional 3D digitization methods, such as photogrammetry or LiDAR scanning, require expensive hardware, expert operators, and extensive on-site access, which are often infeasible in developing contexts. As a result, many of Bangladesh's architectural treasures, from the Paharpur Buddhist Monastery to Ahsan Manzil, remain vulnerable to decay and inaccessible in digital form. This paper introduces Oitijjo-3D, a cost-free generative AI framework that democratizes 3D cultural preservation. By using publicly available Google Street View imagery, Oitijjo-3D reconstructs faithful 3D models of heritage structures through a two-stage pipeline - multimodal visual reasoning with Gemini 2.5 Flash Image for structure-texture synthesis, and neural image-to-3D generation through Hexagen for geometry recovery. The system produces photorealistic, metrically coherent reconstructions in seconds, achieving significant speedups compared to conventional Structure-from-Motion pipelines, without requiring any specialized hardware or expert supervision. Experiments on landmarks such as Ahsan Manzil, Choto Sona Mosque, and Paharpur demonstrate that Oitijjo-3D preserves both visual and structural fidelity while drastically lowering economic and technical barriers. By turning open imagery into digital heritage, this work reframes preservation as a community-driven, AI-assisted act of cultural continuity for resource-limited nations.

Paper Structure

This paper contains 4 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Apple Maps 3D feature showing (left) the Statue of Liberty, (center) the United States Capitol, and (right) the Brooklyn Bridge. Such high-fidelity 3D representations are available in the United States but remain absent in underdeveloped countries like Bangladesh.
  • Figure 2: 2D-to-3D reconstruction results. Top to bottom: Choto Sona Mosque, Shaheed Minar, Somapura Mahavihara, and Rabindra Complex. Left to right: Input Street View, Gemini-synthesized 2D isometric, and Hexagen-generated 3D mesh.
  • Figure 3: 2D-to-3D reconstruction results. Top to bottom: Natore Rajbari, Buddha Dhatu Jadi, and Durjoy Mur Bhairab. Left to right: Input Street View, Gemini-synthesized 2D isometric, and Hexagen-generated 3D mesh.
  • Figure 4: Oitijjo-3D system workflow illustrating the sequential data flow from Street View image collection to final 3D visualization. Stages 1–3 focus on data processing and 2D synthesis, while stages 4–5 handle 3D generation and web-based rendering.