Table of Contents
Fetching ...

FlowTrack: Point-level Flow Network for 3D Single Object Tracking

Shuo Li, Yubo Cui, Zhiheng Li, Zheng Fang

TL;DR

FlowTrack reframes 3D single object tracking as a multi-frame point-level flow estimation problem. It combines a Historical Information Fusion Module to inject history via a learnable target feature, a Point-level Motion Module to generate multi-scale point-level flow, and an Instance Flow Head to convert per-point motion into a global, instance-level target motion for rigid-body transformation. The approach yields strong gains on KITTI and NuScenes, maintains real-time speed, and demonstrates robustness in sparse and occluded scenarios. This work highlights the value of integrating dense point-level motion cues with historical context for improved 3D tracking performance.

Abstract

3D single object tracking (SOT) is a crucial task in fields of mobile robotics and autonomous driving. Traditional motion-based approaches achieve target tracking by estimating the relative movement of target between two consecutive frames. However, they usually overlook local motion information of the target and fail to exploit historical frame information effectively. To overcome the above limitations, we propose a point-level flow method with multi-frame information for 3D SOT task, called FlowTrack. Specifically, by estimating the flow for each point in the target, our method could capture the local motion details of target, thereby improving the tracking performance. At the same time, to handle scenes with sparse points, we present a learnable target feature as the bridge to efficiently integrate target information from past frames. Moreover, we design a novel Instance Flow Head to transform dense point-level flow into instance-level motion, effectively aggregating local motion information to obtain global target motion. Finally, our method achieves competitive performance with improvements of 5.9% on the KITTI dataset and 2.9% on NuScenes. The code will be made publicly available soon.

FlowTrack: Point-level Flow Network for 3D Single Object Tracking

TL;DR

FlowTrack reframes 3D single object tracking as a multi-frame point-level flow estimation problem. It combines a Historical Information Fusion Module to inject history via a learnable target feature, a Point-level Motion Module to generate multi-scale point-level flow, and an Instance Flow Head to convert per-point motion into a global, instance-level target motion for rigid-body transformation. The approach yields strong gains on KITTI and NuScenes, maintains real-time speed, and demonstrates robustness in sparse and occluded scenarios. This work highlights the value of integrating dense point-level motion cues with historical context for improved 3D tracking performance.

Abstract

3D single object tracking (SOT) is a crucial task in fields of mobile robotics and autonomous driving. Traditional motion-based approaches achieve target tracking by estimating the relative movement of target between two consecutive frames. However, they usually overlook local motion information of the target and fail to exploit historical frame information effectively. To overcome the above limitations, we propose a point-level flow method with multi-frame information for 3D SOT task, called FlowTrack. Specifically, by estimating the flow for each point in the target, our method could capture the local motion details of target, thereby improving the tracking performance. At the same time, to handle scenes with sparse points, we present a learnable target feature as the bridge to efficiently integrate target information from past frames. Moreover, we design a novel Instance Flow Head to transform dense point-level flow into instance-level motion, effectively aggregating local motion information to obtain global target motion. Finally, our method achieves competitive performance with improvements of 5.9% on the KITTI dataset and 2.9% on NuScenes. The code will be made publicly available soon.
Paper Structure (16 sections, 15 equations, 8 figures, 6 tables)

This paper contains 16 sections, 15 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Comparison of the current 3D single object tracking frameworks. (a) Instance-level motion estimation typically involves inputting two consecutive frames of point clouds and estimating the overall motion of the target. (b) Multi-frame point-level flow estimation involves inputting multiple historical frames of point clouds and estimating point-level flow within the target. Here, PC represents the input point clouds.
  • Figure 2: The overall framework of FlowTrack. The network comprises a 3D Feature Extraction Backbone, Historical Information Fusion Module, Point-level Motion Module and Instance Flow Head. In each frame, the point cloud is initially processed by the 3D Feature Extraction Backbone to extract voxel features and transform them into BEV. Subsequently, the Historical Information Fusion Module effectively integrates information of the target from historical frames into the template frame. Then, the Point-level Motion Module is used to obtain multi-scale point-level flow features of the target. Finally, the Instance Flow Head adaptively transforms point-level flow into instance-level target motion to obtain the final results for target tracking.
  • Figure 3: Details of Historical Information Fusion. We use learnable target feature as a bridge to facilitate information interaction between historical frames and ultimately supplement the target feature in the template frame.
  • Figure 4: Details of Point-level Motion Module. We utilize multiple layers of convolutional layers to extract multi-scale point-level flow motion features for tracking target. These features are then concatenated along the channel dimension to obtain the final point-level flow feature.
  • Figure 5: The transformation from point-level flow to instance-level motion. (a) The point-level flow map, where grey squares represent the flow weight map; (b) The instance-level motion of the target.
  • ...and 3 more figures