Table of Contents
Fetching ...

Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes

Qi Ma, Danda Pani Paudel, Ender Konukoglu, Luc Van Gool

TL;DR

A large-scale dataset requiring thousands of GPU training days designed to facilitate research and development in implicit functions, which leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.

Abstract

Neural implicit functions have demonstrated significant importance in various areas such as computer vision, graphics. Their advantages include the ability to represent complex shapes and scenes with high fidelity, smooth interpolation capabilities, and continuous representations. Despite these benefits, the development and analysis of implicit functions have been limited by the lack of comprehensive datasets and the substantial computational resources required for their implementation and evaluation. To address these challenges, we introduce "Implicit-Zoo": a large-scale dataset requiring thousands of GPU training days designed to facilitate research and development in this field. Our dataset includes diverse 2D and 3D scenes, such as CIFAR-10, ImageNet-1K, and Cityscapes for 2D image tasks, and the OmniObject3D dataset for 3D vision tasks. We ensure high quality through strict checks, refining or filtering out low-quality data. Using Implicit-Zoo, we showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models. This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.

Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes

TL;DR

A large-scale dataset requiring thousands of GPU training days designed to facilitate research and development in implicit functions, which leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.

Abstract

Neural implicit functions have demonstrated significant importance in various areas such as computer vision, graphics. Their advantages include the ability to represent complex shapes and scenes with high fidelity, smooth interpolation capabilities, and continuous representations. Despite these benefits, the development and analysis of implicit functions have been limited by the lack of comprehensive datasets and the substantial computational resources required for their implementation and evaluation. To address these challenges, we introduce "Implicit-Zoo": a large-scale dataset requiring thousands of GPU training days designed to facilitate research and development in this field. Our dataset includes diverse 2D and 3D scenes, such as CIFAR-10, ImageNet-1K, and Cityscapes for 2D image tasks, and the OmniObject3D dataset for 3D vision tasks. We ensure high quality through strict checks, refining or filtering out low-quality data. Using Implicit-Zoo, we showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models. This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.

Paper Structure

This paper contains 25 sections, 1 equation, 18 figures, 7 tables.

Figures (18)

  • Figure 1: The Implicit-Zoo dataset and its example utilities. We demonstrate three tasks using Implicit-Zoo: classification, segmentation, and 3D pose regression. Details of the problem statement can be found in section \ref{['statement']}. The INRs are colorized differently to indicate their training data sources.
  • Figure 2: Examples of images from INRs. We present visual comparisons of example image pairs from our Implicit-Zoo dataset. The original (left/top) and the reconstruction from INRs (right/bottom) images are presented in pairs, showcasing similar visual quality. Please zoom in for details.
  • Figure 3: Illustration of learnable tokenizer. Instead of retrieve RGB value from images we query learnable coordinates to pre-trained freezed INRs and grouping RGB values to create tokens. Note that during backpropogation the Coordinate $x$ will also be jointly optimized with ViT modules.
  • Figure 4: Illustration of proposed pose regressor. We process 3D volume features and 2D image features with transformer-based encoder and output coarse poses. For further refinement, we freeze the 3D INRs and optimize the pose by minimizing the photometric error.
  • Figure 5: Different RGB grouping strategies. We visualize the proposed RGB grouping strategies with patch size 3. Coordinates with same color will be grouped into the same token. We abbreviate these approaches as (b) "S", (c) "LC", (d) "LP", and (e) "LP+rand".
  • ...and 13 more figures