ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments with Low-Cost Sensors
Shivendra Agrawal, Jake Brawer, Ashutosh Naik, Alessandro Roncone, Bradley Hayes
TL;DR
Quasi-static indoor spaces pose severe localization challenges due to repetitive geometry and drifting local semantics. ShelfAware couples a depth-based geometry model with a distributional semantic representation of object categories and leverages an offline/online inverse semantic model to propose high-quality pose hypotheses, enabling rapid global localization on low-cost vision hardware. The approach is implemented as a semantic particle filter that fuses depth likelihoods with a semantic similarity score and uses a precomputed semantic-view bank for fast localization, validated in a mock grocery store with wearable and cart-mounted configurations. Results show 96% global-localization success, fast convergence (mean ~1.91 s), and robust tracking across dynamic occlusions and sparse semantics, outperforming MCL and AMCL while running in real time on a laptop.
Abstract
Many indoor workspaces are quasi-static: global layout is stable but local semantics change continually, producing repetitive geometry, dynamic clutter, and perceptual noise that defeat vision-based localization. We present ShelfAware, a semantic particle filter for robust global localization that treats scene semantics as statistical evidence over object categories rather than fixed landmarks. ShelfAware fuses a depth likelihood with a category-centric semantic similarity and uses a precomputed bank of semantic viewpoints to perform inverse semantic proposals inside MCL, yielding fast, targeted hypothesis generation on low-cost, vision-only hardware. Across 100 global-localization trials spanning four conditions (cart-mounted, wearable, dynamic obstacles, and sparse semantics) in a semantically dense, retail environment, ShelfAware achieves a 96% success rate (vs. 22% MCL and 10% AMCL) with a mean time-to-convergence of 1.91s, attains the lowest translational RMSE in all conditions, and maintains stable tracking in 80% of tested sequences, all while running in real time on a consumer laptop-class platform. By modeling semantics distributionally at the category level and leveraging inverse proposals, ShelfAware resolves geometric aliasing and semantic drift common to quasi-static domains. Because the method requires only vision sensors and VIO, it integrates as an infrastructure-free building block for mobile robots in warehouses, labs, and retail settings; as a representative application, it also supports the creation of assistive devices providing start-anytime, shared-control assistive navigation for people with visual impairments.
