Table of Contents
Fetching ...

Enhancing In-vehicle Multiple Object Tracking Systems with Embeddable Ising Machines

Kosuke Tatsumura, Yohei Hamakawa, Masaya Yamasaki, Koji Oya, Hiroshi Fujimoto

TL;DR

The paper addresses the challenge of solving NP-hard assignment problems in multi-object tracking within in-vehicle systems. It introduces a flexible assignment framework solved by a simulated-bifurcation Ising machine implemented on vehicle-grade FPGAs, enabling robust tracking through long-term occlusions. The approach yields real-time performance around 23 FPS and improvements in association accuracy (HOTA) over a baseline, demonstrated on a dual-FPGA platform with YOLOv2 detections. By solving two QUBO instances per frame with different constraint penalties, it detects occlusion events via potentially-matching states, advancing MOT capabilities in constrained automotive environments. The work also highlights potential extensions to richer feature-based similarities and other NP-hard tasks like SLAM and scheduling using embeddable Ising machines.

Abstract

A cognitive function of tracking multiple objects, needed in autonomous mobile vehicles, comprises object detection and their temporal association. While great progress owing to machine learning has been recently seen for elaborating the similarity matrix between the objects that have been recognized and the objects detected in a current video frame, less for the assignment problem that finally determines the temporal association, which is a combinatorial optimization problem. Here we show an in-vehicle multiple object tracking system with a flexible assignment function for tracking through multiple long-term occlusion events. To solve the flexible assignment problem formulated as a nondeterministic polynomial time-hard problem, the system relies on an embeddable Ising machine based on a quantum-inspired algorithm called simulated bifurcation. Using a vehicle-mountable computing platform, we demonstrate a realtime system-wide throughput (23 frames per second on average) with the enhanced functionality.

Enhancing In-vehicle Multiple Object Tracking Systems with Embeddable Ising Machines

TL;DR

The paper addresses the challenge of solving NP-hard assignment problems in multi-object tracking within in-vehicle systems. It introduces a flexible assignment framework solved by a simulated-bifurcation Ising machine implemented on vehicle-grade FPGAs, enabling robust tracking through long-term occlusions. The approach yields real-time performance around 23 FPS and improvements in association accuracy (HOTA) over a baseline, demonstrated on a dual-FPGA platform with YOLOv2 detections. By solving two QUBO instances per frame with different constraint penalties, it detects occlusion events via potentially-matching states, advancing MOT capabilities in constrained automotive environments. The work also highlights potential extensions to richer feature-based similarities and other NP-hard tasks like SLAM and scheduling using embeddable Ising machines.

Abstract

A cognitive function of tracking multiple objects, needed in autonomous mobile vehicles, comprises object detection and their temporal association. While great progress owing to machine learning has been recently seen for elaborating the similarity matrix between the objects that have been recognized and the objects detected in a current video frame, less for the assignment problem that finally determines the temporal association, which is a combinatorial optimization problem. Here we show an in-vehicle multiple object tracking system with a flexible assignment function for tracking through multiple long-term occlusion events. To solve the flexible assignment problem formulated as a nondeterministic polynomial time-hard problem, the system relies on an embeddable Ising machine based on a quantum-inspired algorithm called simulated bifurcation. Using a vehicle-mountable computing platform, we demonstrate a realtime system-wide throughput (23 frames per second on average) with the enhanced functionality.

Paper Structure

This paper contains 7 sections, 12 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: In-vehicle multiple object tracking (MOT) system using an embeddable Ising machine to solve a flexible assignment problem formulated as NP-hard combinatorial optimization.
  • Figure 2: Flexible assignment function in the MOT system. (a) Submodules in the assignor. The assignment result between trackers and detections considering the possibility of many-to-one correspondence is determined from two assignment tables that the Ising machine produces by solving the similarity matrix-based assignment problem two times per frame while changing the weight coefficient of the penalty function corresponding to the constraint for one-to-one correspondence. (b) A Scene without occlusion and the resultant assignment tables (c and d). (f) A scene with occlusion and the resultant assignment tables (g and h). (e) Solution space of the decision variables including constraint-violating/satisfying solutions.
  • Figure 3: Implementation of the MOT system with a vehicle-mountable computing platform. (a) A photograph of the system showing a camera and two computing boards with each having a monolithic MPU-FPGA chip. (b) Hardware configuration and the block diagram of modules implemented thereon. (c) Placement of the modules in the monolithic MPU-FPGA chip for the assignor. The custom circuit for simulated bifurcation as an embedded Ising machine is red highlighted in the FPGA fabric.
  • Figure 4: Processing speed and functionality of the MOT system with the flexible assignment function. (a) Histogram of the calculation times of the modules in the MOT system when processing a MOT benchmark problem (a movie including 600 frames) named "MOT17-02-FRCNN" MOT17MOT17data. The inset illustrates the timing chart of the overlapped operation of the detector and the other tracking modules. (b) A scene extracted from "MOT17-02-FRCNN" with trackers indicated as boxes. (c, d, e) Five-object tracking through a complex occlusion event (simultaneous occurrences of three-object crossing and two-object crossing). (c, d, e) shows, respectively, the frames #50, #80, and #110 extracted from a movie (inc. 142 frames) provided as Supplementary Information 3. The boxes indicate trackers. (f, g, h) Similarity matrixes with red and blue boxes meaning the assignment results between trackers and detections. (f, g, h) correspond to (c, d, e), respectively.
  • Figure 5: A scene extracted from the Supplementary information 2
  • ...and 2 more figures