Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders

Jingyu Guo; Sensen Gao; Jia-Wang Bian; Wanhu Sun; Heliang Zheng; Rongfei Jia; Mingming Gong

Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders

Jingyu Guo, Sensen Gao, Jia-Wang Bian, Wanhu Sun, Heliang Zheng, Rongfei Jia, Mingming Gong

TL;DR

Hyper3D targets the bottleneck of high-fidelity 3D shape encoding for VAEs used in diffusion-based 3D generation. It introduces an octree-based input feature extractor and a hybrid triplane latent that combines a high-resolution 2D plane with a low-resolution 3D grid to preserve explicit 3D structure while maintaining a compact latent. Through extensive experiments on Objaverse, Hyper3D outperforms baselines in reconstruction quality and fine geometric detail, with ablations validating the contributions of octree inputs and the hybrid latent. The work paves the way for more efficient, high-detail 3D generation pipelines and suggests future directions including stronger generative models and multi-modal texture integration.

Abstract

Recent 3D content generation pipelines often leverage Variational Autoencoders (VAEs) to encode shapes into compact latent representations, facilitating diffusion-based generation. Efficiently compressing 3D shapes while preserving intricate geometric details remains a key challenge. Existing 3D shape VAEs often employ uniform point sampling and 1D/2D latent representations, such as vector sets or triplanes, leading to significant geometric detail loss due to inadequate surface coverage and the absence of explicit 3D representations in the latent space. Although recent work explores 3D latent representations, their large scale hinders high-resolution encoding and efficient training. Given these challenges, we introduce Hyper3D, which enhances VAE reconstruction through efficient 3D representation that integrates hybrid triplane and octree features. First, we adopt an octree-based feature representation to embed mesh information into the network, mitigating the limitations of uniform point sampling in capturing geometric distributions along the mesh surface. Furthermore, we propose a hybrid latent space representation that integrates a high-resolution triplane with a low-resolution 3D grid. This design not only compensates for the lack of explicit 3D representations but also leverages a triplane to preserve high-resolution details. Experimental results demonstrate that Hyper3D outperforms traditional representations by reconstructing 3D shapes with higher fidelity and finer details, making it well-suited for 3D generation pipelines.

Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders

TL;DR

Abstract

Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)