Table of Contents
Fetching ...

R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction

Chenhuan Li, Meihua Xiao, zehuan li, Fangping Chen, Shanshan Qiao, Dingli Wang, Mengxi Gao, Siyi Zhang

TL;DR

To the best of the knowledge, this is the first work to apply shifted window attention to voxel 3D reconstruction, and experimental results on ShapeNet verify the method achieves SOTA accuracy in single-view reconstruction.

Abstract

Recently, vision transformers have performed well in various computer vision tasks, including voxel 3D reconstruction. However, the windows of the vision transformer are not multi-scale, and there is no connection between the windows, which limits the accuracy of voxel 3D reconstruction. Therefore, we propose a voxel 3D reconstruction network based on shifted window attention. To the best of our knowledge, this is the first work to apply shifted window attention to voxel 3D reconstruction. Experimental results on ShapeNet verify our method achieves SOTA accuracy in single-view reconstruction.

R3D-SWIN:Use Shifted Window Attention for Single-View 3D Reconstruction

TL;DR

To the best of the knowledge, this is the first work to apply shifted window attention to voxel 3D reconstruction, and experimental results on ShapeNet verify the method achieves SOTA accuracy in single-view reconstruction.

Abstract

Recently, vision transformers have performed well in various computer vision tasks, including voxel 3D reconstruction. However, the windows of the vision transformer are not multi-scale, and there is no connection between the windows, which limits the accuracy of voxel 3D reconstruction. Therefore, we propose a voxel 3D reconstruction network based on shifted window attention. To the best of our knowledge, this is the first work to apply shifted window attention to voxel 3D reconstruction. Experimental results on ShapeNet verify our method achieves SOTA accuracy in single-view reconstruction.
Paper Structure (12 sections, 4 equations, 3 figures, 5 tables)

This paper contains 12 sections, 4 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Illustration of our proposed R3D-SWIN and its details.
  • Figure 2: An illustration of the shifted window approach for computing self-attention in the proposed Swin Transformer architecture. In layer l (left), a regular window partitioning scheme is adopted, and self-attention is computed within each window. In the next layer l + 1 (right), the window partitioning is shifted, resulting in new windows. The self-attention computation in the new windows crosses the boundaries of the previous windows in layer l, providing connections among themref12.
  • Figure 3: An illustration of the decoder.