Table of Contents
Fetching ...

2D Representation for Unguided Single-View 3D Super-Resolution in Real-Time

Ignasi Mas, Ivan Huerta, Ramon Morros, Javier Ruiz-Hidalgo

TL;DR

The paper addresses unguided single-view 3D super-resolution by recasting geometry as a 2D PNCC image, enabling the use of efficient 2D image super-resolution models. It introduces two implementations, SwinT-PNCC and VM-PNCC, to achieve high accuracy and real-time performance without relying on high-resolution RGB inputs. The approach demonstrates state-of-the-art results on unguided Depth SR benchmarks across NYUv2, Middlebury, and RGB-D-D, with strong generalization and robust 3D reconstructions. This framework provides a practical bridge between 2D SR techniques and 3D geometry enhancement, with potential extensions to multi-view setups and broader 3D data tasks.

Abstract

We introduce 2Dto3D-SR, a versatile framework for real-time single-view 3D super-resolution that eliminates the need for high-resolution RGB guidance. Our framework encodes 3D data from a single viewpoint into a structured 2D representation, enabling the direct application of existing 2D image super-resolution architectures. We utilize the Projected Normalized Coordinate Code (PNCC) to represent 3D geometry from a visible surface as a regular image, thereby circumventing the complexities of 3D point-based or RGB-guided methods. This design supports lightweight and fast models adaptable to various deployment environments. We evaluate 2Dto3D-SR with two implementations: one using Swin Transformers for high accuracy, and another using Vision Mamba for high efficiency. Experiments show the Swin Transformer model achieves state-of-the-art accuracy on standard benchmarks, while the Vision Mamba model delivers competitive results at real-time speeds. This establishes our geometry-guided pipeline as a surprisingly simple yet viable and practical solution for real-world scenarios, especially where high-resolution RGB data is inaccessible.

2D Representation for Unguided Single-View 3D Super-Resolution in Real-Time

TL;DR

The paper addresses unguided single-view 3D super-resolution by recasting geometry as a 2D PNCC image, enabling the use of efficient 2D image super-resolution models. It introduces two implementations, SwinT-PNCC and VM-PNCC, to achieve high accuracy and real-time performance without relying on high-resolution RGB inputs. The approach demonstrates state-of-the-art results on unguided Depth SR benchmarks across NYUv2, Middlebury, and RGB-D-D, with strong generalization and robust 3D reconstructions. This framework provides a practical bridge between 2D SR techniques and 3D geometry enhancement, with potential extensions to multi-view setups and broader 3D data tasks.

Abstract

We introduce 2Dto3D-SR, a versatile framework for real-time single-view 3D super-resolution that eliminates the need for high-resolution RGB guidance. Our framework encodes 3D data from a single viewpoint into a structured 2D representation, enabling the direct application of existing 2D image super-resolution architectures. We utilize the Projected Normalized Coordinate Code (PNCC) to represent 3D geometry from a visible surface as a regular image, thereby circumventing the complexities of 3D point-based or RGB-guided methods. This design supports lightweight and fast models adaptable to various deployment environments. We evaluate 2Dto3D-SR with two implementations: one using Swin Transformers for high accuracy, and another using Vision Mamba for high efficiency. Experiments show the Swin Transformer model achieves state-of-the-art accuracy on standard benchmarks, while the Vision Mamba model delivers competitive results at real-time speeds. This establishes our geometry-guided pipeline as a surprisingly simple yet viable and practical solution for real-world scenarios, especially where high-resolution RGB data is inaccessible.

Paper Structure

This paper contains 10 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of our 2Dto3D-SR framework. A low-resolution depth map is first converted to a PNCC representation. A standard 2D SR model (here, SwinT-PNCC) is applied in this domain to produce a high-resolution PNCC map, which is then converted back to a high-resolution 3D representation like a depth map or point cloud.
  • Figure 2: Illustration of the PNCC representation of single-view 3D data.
  • Figure 3: Qualitative Results: Predicted depth of different experiments in aligned NYUv2 at $\times$4
  • Figure 4: Qualitative Results: Single-view Point Clouds across different methods (2 views of the surface for each) on RGB-D-D at $\times$4. RGB is added for visualization to all methods, including the unguided ones. Noisy hulls are observed with bicubic upscaling, while other methods exhibit scattered outliers around the scene, noticeable near the person's cap and around the bed and more present in SGNet than SwinT-PNCC.