X-Ray: A Sequential 3D Representation For Generation
Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee
TL;DR
The paper introduces X-Ray, a sequential 3D representation that encodes an object's full surface geometry—visible and hidden—into a multi-layer, video-like format using ray casting. It then builds a two-stage generative pipeline (X-Ray Diffusion Model + X-Ray Upsampler) that generates low- and high-resolution X-Rays conditioned on a single image and decodes them into 3D meshes via point clouds and Screened Poisson reconstruction. Empirically, X-Ray outperforms rendering-based baselines on single-view reconstruction benchmarks (GSO, OmniObject3D) and surpasses state-of-the-art 3D diffusion methods on ShapeNet Car generation, while substantially reducing data footprint by focusing on surfaces. The approach enables leveraging video diffusion techniques for 3D synthesis, achieving complete object reconstructions, including internal structures, and offering a scalable, generalizable path for 3D generation from images with practical applications in design and visualization.
Abstract
We introduce X-Ray, a novel 3D sequential representation inspired by the penetrability of x-ray scans. X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images. Our method utilizes ray casting from the camera center to capture geometric and textured details, including depth, normal, and color, across all intersected surfaces. This process efficiently condenses the whole 3D object into a multi-frame video format, motivating the utilize of a network architecture similar to those in video diffusion models. This design ensures an efficient 3D representation by focusing solely on surface information. Also, we propose a two-stage pipeline to generate 3D objects from X-Ray Diffusion Model and Upsampler. We demonstrate the practicality and adaptability of our X-Ray representation by synthesizing the complete visible and hidden surfaces of a 3D object from a single input image. Experimental results reveal the state-of-the-art superiority of our representation in enhancing the accuracy of 3D generation, paving the way for new 3D representation research and practical applications.
