Table of Contents
Fetching ...

Symmetria: A Synthetic Dataset for Learning in Point Clouds

Ivan Sipiran, Gustavo Santelices, Lucas Oyarzún, Andrea Ranieri, Chiara Romanengo, Silvia Biasotti, Bianca Falcidieno

TL;DR

Symmetria addresses the data scarcity and privacy concerns of 3D point-cloud learning by introducing a scalable, ground-truth-rich synthetic dataset generated from parametric planar curves with known symmetries. It provides a comprehensive benchmark including curve families, surface generation via extrusion and revolution, controlled perturbations, and explicit ground-truth symmetry annotations, enabling SSL pre-training and symmetry-detection evaluation. Empirically, SSL pre-training on Symmetria yields competitive downstream performance across classification, segmentation, and few-shot tasks compared to ShapeNet, with strong data efficiency at 10K–50K samples; a dedicated symmetry-detection benchmark further evaluates the capacity to recover geometric symmetries. Ablation studies reveal the importance of curve diversity and perturbations for robust representations, while results from scaling the dataset highlight potential capacity limits and guide future architectural and data-design choices. Overall, Symmetria offers a practical, privacy-friendly platform for advancing 3D representation learning and symmetry understanding with broad applicability to real-world tasks.

Abstract

Unlike image or text domains that benefit from an abundance of large-scale datasets, point cloud learning techniques frequently encounter limitations due to the scarcity of extensive datasets. To overcome this limitation, we present Symmetria, a formula-driven dataset that can be generated at any arbitrary scale. By construction, it ensures the absolute availability of precise ground truth, promotes data-efficient experimentation by requiring fewer samples, enables broad generalization across diverse geometric settings, and offers easy extensibility to new tasks and modalities. Using the concept of symmetry, we create shapes with known structure and high variability, enabling neural networks to learn point cloud features effectively. Our results demonstrate that this dataset is highly effective for point cloud self-supervised pre-training, yielding models with strong performance in downstream tasks such as classification and segmentation, which also show good few-shot learning capabilities. Additionally, our dataset can support fine-tuning models to classify real-world objects, highlighting our approach's practical utility and application. We also introduce a challenging task for symmetry detection and provide a benchmark for baseline comparisons. A significant advantage of our approach is the public availability of the dataset, the accompanying code, and the ability to generate very large collections, promoting further research and innovation in point cloud learning.

Symmetria: A Synthetic Dataset for Learning in Point Clouds

TL;DR

Symmetria addresses the data scarcity and privacy concerns of 3D point-cloud learning by introducing a scalable, ground-truth-rich synthetic dataset generated from parametric planar curves with known symmetries. It provides a comprehensive benchmark including curve families, surface generation via extrusion and revolution, controlled perturbations, and explicit ground-truth symmetry annotations, enabling SSL pre-training and symmetry-detection evaluation. Empirically, SSL pre-training on Symmetria yields competitive downstream performance across classification, segmentation, and few-shot tasks compared to ShapeNet, with strong data efficiency at 10K–50K samples; a dedicated symmetry-detection benchmark further evaluates the capacity to recover geometric symmetries. Ablation studies reveal the importance of curve diversity and perturbations for robust representations, while results from scaling the dataset highlight potential capacity limits and guide future architectural and data-design choices. Overall, Symmetria offers a practical, privacy-friendly platform for advancing 3D representation learning and symmetry understanding with broad applicability to real-world tasks.

Abstract

Unlike image or text domains that benefit from an abundance of large-scale datasets, point cloud learning techniques frequently encounter limitations due to the scarcity of extensive datasets. To overcome this limitation, we present Symmetria, a formula-driven dataset that can be generated at any arbitrary scale. By construction, it ensures the absolute availability of precise ground truth, promotes data-efficient experimentation by requiring fewer samples, enables broad generalization across diverse geometric settings, and offers easy extensibility to new tasks and modalities. Using the concept of symmetry, we create shapes with known structure and high variability, enabling neural networks to learn point cloud features effectively. Our results demonstrate that this dataset is highly effective for point cloud self-supervised pre-training, yielding models with strong performance in downstream tasks such as classification and segmentation, which also show good few-shot learning capabilities. Additionally, our dataset can support fine-tuning models to classify real-world objects, highlighting our approach's practical utility and application. We also introduce a challenging task for symmetry detection and provide a benchmark for baseline comparisons. A significant advantage of our approach is the public availability of the dataset, the accompanying code, and the ability to generate very large collections, promoting further research and innovation in point cloud learning.

Paper Structure

This paper contains 43 sections, 3 equations, 11 figures, 12 tables, 1 algorithm.

Figures (11)

  • Figure 1: A glimpse into the Symmetria dataset: a formula-driven synthetic dataset composed of symmetric shapes generated from planar parametric curves.
  • Figure 2: Extrusion examples
  • Figure 3: Revolution example.
  • Figure 4: Class-level F1 measure for test set in ModelNet. The base model is Point-MAE. The model is pre-trained with ShapeNet (blue) and SymSSL-10K (red).
  • Figure 5: Per-class relationship between level of asymmetry and difference in F1 measure (Symmetria vs ShapeNet). The F1 score is computed on the ModelNet's test set.
  • ...and 6 more figures