Viewpoint Recommendation for Point Cloud Labeling through Interaction Cost Modeling

Yu Zhang; Xinyi Zhao; Chongke Bi; Siming Chen

Viewpoint Recommendation for Point Cloud Labeling through Interaction Cost Modeling

Yu Zhang, Xinyi Zhao, Chongke Bi, Siming Chen

TL;DR

This work addresses the time burden of annotating 3D point clouds by introducing a viewpoint recommendation framework that minimizes lasso selection time. It anchors the approach in a Fitts' law–derived time-cost model for lassoing in 2D projections and optimizes viewpoints via grid search to minimize the estimated cost. An integrated labeling system demonstrates reduced labeling time and improved user satisfaction in ablation studies, with qualitative comparisons showing advantages over traditional viewpoint strategies. The results suggest that model-based evaluation can guide the design of interactive data-labeling tools and that viewpoint optimization can meaningfully accelerate routine labeling tasks in 3D vision applications.

Abstract

Semantic segmentation of 3D point clouds is important for many applications, such as autonomous driving. To train semantic segmentation models, labeled point cloud segmentation datasets are essential. Meanwhile, point cloud labeling is time-consuming for annotators, which typically involves tuning the camera viewpoint and selecting points by lasso. To reduce the time cost of point cloud labeling, we propose a viewpoint recommendation approach to reduce annotators' labeling time costs. We adapt Fitts' law to model the time cost of lasso selection in point clouds. Using the modeled time cost, the viewpoint that minimizes the lasso selection time cost is recommended to the annotator. We build a data labeling system for semantic segmentation of 3D point clouds that integrates our viewpoint recommendation approach. The system enables users to navigate to recommended viewpoints for efficient annotation. Through an ablation study, we observed that our approach effectively reduced the data labeling time cost. We also qualitatively compare our approach with previous viewpoint selection approaches on different datasets.

Viewpoint Recommendation for Point Cloud Labeling through Interaction Cost Modeling

TL;DR

Abstract

Paper Structure (37 sections, 12 equations, 13 figures, 1 table, 3 algorithms)

This paper contains 37 sections, 12 equations, 13 figures, 1 table, 3 algorithms.

Introduction
Related Work
Point Cloud Labeling
Viewpoint Selection
Background on Point Cloud Labeling
Labeling Interactions
Labeling Process with Lasso Selection
Viewpoint Recommendation Approach
Background of Fitts' Law
Modeling Lasso Selection Time Cost
Dotted Fixed-Width Tunnel Passing Task
Lasso Selection Task
Lasso Selection Time Cost Estimation Algorithm
Time Complexity Analysis
Grid Searching Optimal Viewpoint
...and 22 more sections

Figures (13)

Figure 1: Steps of using lasso selection for semantic segmentation: (1) Identify object(s) to label in the 3D scene. (2) Adjust the camera viewpoint to focus on the objects to be labeled and avoid overlapping the objects with points of other categories. (3) Draw lasso polygon(s) to select points and assign labels.
Figure 2: Fitts' law for mouse movement tasks: (A) For pointing, $T = a_p + b_p log_2 (\frac{D}{W} + 1)$. (B) For goal passing, $T = a_g + b_g log_2 (\frac{D}{W} + 1)$. (C) For fixed-width tunnel passing, $T = a_s + b_s \frac{D}{W}$. (D) For general tunnel passing, $T = a_s + b_s \int_c \frac{ds}{W(s)}$.
Figure 3: Dotted fixed-width tunnel passing: (A) A dotted fixed-width tunnel passing task has parameters $D$, $W$, $r$, and $d$. We regard that a dotted fixed-width tunnel passing task can be decomposed into two types of subtasks: curved tunnel passing subtasks and goal passing subtasks. (B) A curved tunnel passing subtask has parameters $W$ and $r$. The tunnel width at $t \in [0, 2r]$ is $w(t) = W + 2r - 2\sqrt{r^2 - (t-r)^2}$. (C) A goal passing subtask has parameters $W$, $r$, and $d$. (D) Examples of valid and invalid mouse movement paths for dotted fixed-width tunnel passing. The dotted fixed-width tunnel passing task poses less restrictions than the fixed-width tunnel passing task in Fig. \ref{['fig:fitts-law']}(C). The mouse movement path is allowed to exceed the tunnel but not allowed to enclose any point.
Figure 4: Upper and lower bounds of the dotted fixed-width tunnel passing: (A) The original dotted fixed-width tunnel passing task has parameters $D \in \mathbb{R}^+$, $W \in \mathbb{R}^+$, $r \in \mathbb{R}^+$, and $d \in \mathbb{R}_{\geq 0}$. Let $ID(D, W, r, d)$ denote the index of difficulty of this task, which is a function with arguments $D$, $W$, $r$, and $d$. (B) The upper bound of $ID(D, W, r, d)$ is $\frac{D}{W}$. The upper bound is achieved when $d = 0$ and $r \rightarrow 0^+$. In this case, geometrically, the dotted fixed-width tunnel passing task is equivalent to a fixed-width tunnel passing task (see Fig. \ref{['fig:fitts-law']}(C)). (C) The lower bound of $ID(D, W, r, d)$ is $mk log_2(\frac{D}{kW} + 1)$. The lower bound is achieved when $d \neq 0$ and $r \rightarrow 0^+$. In this case, geometrically, the dotted fixed-width tunnel passing task is equivalent to a series of $k$ goal passing tasks (see Fig. \ref{['fig:fitts-law']}(B)). (D) Compared to the dotted fixed-width tunnel passing task, the dotted general tunnel passing task no longer requires the points to be evenly distributed on two parallel line segments. (E) The dotted general tunnel passing task is geometrically equivalent to a general tunnel passing task (see Fig. \ref{['fig:fitts-law']}(D)) when $d = 0$ and $r \rightarrow 0^+$. (F) The dotted general tunnel passing task is geometrically equivalent to a series of goal passing tasks task (see Fig. \ref{['fig:fitts-law']}(B)) when $d \neq 0$ and $r \rightarrow 0^+$.
Figure 5: There can be multiple tunnels to estimate the lasso selection time cost for a given point cloud.
...and 8 more figures

Viewpoint Recommendation for Point Cloud Labeling through Interaction Cost Modeling

TL;DR

Abstract

Viewpoint Recommendation for Point Cloud Labeling through Interaction Cost Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (13)