Table of Contents
Fetching ...

Privacy-Preserving Visual Localization with Event Cameras

Junho Kim, Young Min Kim, Ramzi Zahreddine, Weston A. Welge, Gurunandan Krishnan, Sizhuo Ma, Jian Wang

TL;DR

This work addresses privacy-aware visual localization on resource-constrained edge devices by leveraging event cameras and a two-tier privacy framework. It combines event-to-image conversion with mature image-based localization to achieve robust 6-DoF pose estimation against challenging conditions, while protecting privacy through network-level split inference and sensor-level filtering. The authors introduce EvRooms and EvHumans datasets, conduct a user study, and demonstrate that privacy protections yield meaningful security gains with only modest localization degradation. Overall, the approach offers a practical, privacy-preserving building block for location-based services in mobile AR/VR contexts.

Abstract

We consider the problem of client-server localization, where edge device users communicate visual data with the service provider for locating oneself against a pre-built 3D map. This localization paradigm is a crucial component for location-based services in AR/VR or mobile applications, as it is not trivial to store large-scale 3D maps and process fast localization on resource-limited edge devices. Nevertheless, conventional client-server localization systems possess numerous challenges in computational efficiency, robustness, and privacy-preservation during data transmission. Our work aims to jointly solve these challenges with a localization pipeline based on event cameras. By using event cameras, our system consumes low energy and maintains small memory bandwidth. Then during localization, we propose applying event-to-image conversion and leverage mature image-based localization, which achieves robustness even in low-light or fast-moving scenes. To further enhance privacy protection, we introduce privacy protection techniques at two levels. Network level protection aims to hide the entire user's view in private scenes using a novel split inference approach, while sensor level protection aims to hide sensitive user details such as faces with light-weight filtering. Both methods involve small client-side computation and localization performance loss, while significantly mitigating the feeling of insecurity as revealed in our user study. We thus project our method to serve as a building block for practical location-based services using event cameras. Project page including the code is available through this link: https://82magnolia.github.io/event\_localization/.

Privacy-Preserving Visual Localization with Event Cameras

TL;DR

This work addresses privacy-aware visual localization on resource-constrained edge devices by leveraging event cameras and a two-tier privacy framework. It combines event-to-image conversion with mature image-based localization to achieve robust 6-DoF pose estimation against challenging conditions, while protecting privacy through network-level split inference and sensor-level filtering. The authors introduce EvRooms and EvHumans datasets, conduct a user study, and demonstrate that privacy protections yield meaningful security gains with only modest localization degradation. Overall, the approach offers a practical, privacy-preserving building block for location-based services in mobile AR/VR contexts.

Abstract

We consider the problem of client-server localization, where edge device users communicate visual data with the service provider for locating oneself against a pre-built 3D map. This localization paradigm is a crucial component for location-based services in AR/VR or mobile applications, as it is not trivial to store large-scale 3D maps and process fast localization on resource-limited edge devices. Nevertheless, conventional client-server localization systems possess numerous challenges in computational efficiency, robustness, and privacy-preservation during data transmission. Our work aims to jointly solve these challenges with a localization pipeline based on event cameras. By using event cameras, our system consumes low energy and maintains small memory bandwidth. Then during localization, we propose applying event-to-image conversion and leverage mature image-based localization, which achieves robustness even in low-light or fast-moving scenes. To further enhance privacy protection, we introduce privacy protection techniques at two levels. Network level protection aims to hide the entire user's view in private scenes using a novel split inference approach, while sensor level protection aims to hide sensitive user details such as faces with light-weight filtering. Both methods involve small client-side computation and localization performance loss, while significantly mitigating the feeling of insecurity as revealed in our user study. We thus project our method to serve as a building block for practical location-based services using event cameras. Project page including the code is available through this link: https://82magnolia.github.io/event\_localization/.
Paper Structure (61 sections, 9 equations, 9 figures, 6 tables)

This paper contains 61 sections, 9 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Overview of our approach. (a) Client-server localization introduces privacy concerns. (b) Event cameras have numerous hardware benefits for localization. (c) We achieve privacy-preserving localization by applying protection techniques tailored to events during events pre-processing (sensor level) and event-to-image conversion (network level) (top), where the results are then used for localization (bottom).
  • Figure 2: User study results on our privacy protection method. The insecurity scores range between 1 and 5, where higher score indicates higher insecurity. (a) We make an initial measurement on how users feel about being captured using normal cameras in various scenarios. (b) Then, we query on event cameras by sequentially showing raw events, event-to-image reconstructions, and privacy protection results.
  • Figure 3: Split inference setup for network level privacy protection. To save compute while hiding sensitive visual information, the original network weights are split (left) and only the costly intermediate part ($F_\Theta^2$) is shared with the service provider. Then, the compute is distributed between the user and service provider: the user performs the light-weight frontal and lateral network inference and the service provider performs the heavy part (right).
  • Figure 4: Network level privacy protection targeting users in private scenes. (a) We identify three possible attacks from the service provider. Case 1: Frontal layer inversion attempts to decode the intermediate activations with a learned network $G_\Phi$. Case 2: Swapping weight attack combines the shared network weights with the publicly available weights (gray) to obtain image reconstructions. Case 3: Reverse engineering operates similarly but with network weights trained on the server side aiming to reverse-engineer the unshared network weights. For defense, we propose per-scene re-training (top) using adversarial losses (middle) and noise-infused event voxels (bottom). (b) In the resulting network level protection, users deploy a privately-trained reconstruction network $F_{\Theta^\prime}$ and share the intermediate part with the server during inference.
  • Figure 5: Sensor-level privacy protection. (a) We attenuate temporally inconsistent regions via median filtering and curvy regions via maximum reflection filtering. (b) While the filtering operations can preserve consistent motion or linear regions, the operations will deliberately scramble the values at other regions, leading to blurry reconstructions. (c) To reduce artifacts, the averaged voxels $E_\text{avg}{=}(E_\text{med}{+}E_\text{ref})/2$ are selectively blended with the original voxels. Here we use a binary mask $U$ that selects averaged voxels $E_\text{avg}$ only when voxel values are over a threshold.
  • ...and 4 more figures