OpenLex3D: A Tiered Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
Christina Kassab, Sacha Morin, Martin Büchner, Matías Mattamala, Kumaraditya Gupta, Abhinav Valada, Liam Paull, Maurice Fallon
TL;DR
OpenLex3D addresses the gap in evaluating open-vocabulary 3D scene representations by introducing a four-category label taxonomy (synonyms, depictions, visually similar, clutter) and relabeling Replica, ScanNet++, and HM3D to 3812 objects with up to 13x more labels per scene. It defines two evaluation tasks—tiered open-set semantic segmentation and open-set object retrieval—with large per-dataset prompt lists and per-scene retrieval queries to probe language-grounded 3D perception. Two metrics, Top-N Frequency and Set Ranking, quantify per-point category accuracy and the distribution of predictions across precision tiers, revealing distinct failure modes across methods. Experiments show that no single method excels across both tasks, highlighting the need for improved feature fusion and segmentation strategies, and the benchmark is publicly available for widespread use.
Abstract
3D scene understanding has been transformed by open-vocabulary language models that enable interaction via natural language. However, at present the evaluation of these representations is limited to datasets with closed-set semantics that do not capture the richness of language. This work presents OpenLex3D, a dedicated benchmark for evaluating 3D open-vocabulary scene representations. OpenLex3D provides entirely new label annotations for scenes from Replica, ScanNet++, and HM3D, which capture real-world linguistic variability by introducing synonymical object categories and additional nuanced descriptions. Our label sets provide 13 times more labels per scene than the original datasets. By introducing an open-set 3D semantic segmentation task and an object retrieval task, we evaluate various existing 3D open-vocabulary methods on OpenLex3D, showcasing failure cases, and avenues for improvement. Our experiments provide insights on feature precision, segmentation, and downstream capabilities. The benchmark is publicly available at: https://openlex3d.github.io/.
