2D Representation for Unguided Single-View 3D Super-Resolution in Real-Time
Ignasi Mas, Ivan Huerta, Ramon Morros, Javier Ruiz-Hidalgo
TL;DR
The paper addresses unguided single-view 3D super-resolution by recasting geometry as a 2D PNCC image, enabling the use of efficient 2D image super-resolution models. It introduces two implementations, SwinT-PNCC and VM-PNCC, to achieve high accuracy and real-time performance without relying on high-resolution RGB inputs. The approach demonstrates state-of-the-art results on unguided Depth SR benchmarks across NYUv2, Middlebury, and RGB-D-D, with strong generalization and robust 3D reconstructions. This framework provides a practical bridge between 2D SR techniques and 3D geometry enhancement, with potential extensions to multi-view setups and broader 3D data tasks.
Abstract
We introduce 2Dto3D-SR, a versatile framework for real-time single-view 3D super-resolution that eliminates the need for high-resolution RGB guidance. Our framework encodes 3D data from a single viewpoint into a structured 2D representation, enabling the direct application of existing 2D image super-resolution architectures. We utilize the Projected Normalized Coordinate Code (PNCC) to represent 3D geometry from a visible surface as a regular image, thereby circumventing the complexities of 3D point-based or RGB-guided methods. This design supports lightweight and fast models adaptable to various deployment environments. We evaluate 2Dto3D-SR with two implementations: one using Swin Transformers for high accuracy, and another using Vision Mamba for high efficiency. Experiments show the Swin Transformer model achieves state-of-the-art accuracy on standard benchmarks, while the Vision Mamba model delivers competitive results at real-time speeds. This establishes our geometry-guided pipeline as a surprisingly simple yet viable and practical solution for real-world scenarios, especially where high-resolution RGB data is inaccessible.
