Table of Contents
Fetching ...

Privacy-Preserving Semantic Segmentation from Ultra-Low-Resolution RGB Inputs

Xuying Huang, Sicong Pan, Olga Zatsarynna, Juergen Gall, Maren Bennewitz

Abstract

RGB-based semantic segmentation has become a mainstream approach for visual perception and is widely applied in a variety of downstream tasks. However, existing methods typically rely on high-resolution RGB inputs, which may expose sensitive visual content in privacy-critical environments. Ultra-low-resolution RGB sensing suppresses sensitive information directly during image acquisition, making it an attractive privacy-preserving alternative. Nevertheless, recovering semantic segmentation from ultra-low-resolution RGB inputs remains highly challenging due to severe visual degradation. In this work, we introduce a novel fully joint-learning framework to mitigate the optimization conflicts exacerbated by visual degradation for ultra-low-resolution semantic segmentation. Experiments demonstrate that our method outperforms representative baselines in semantic segmentation performance and our ultra-low-resolution RGB input achieves a favorable trade-off between privacy preservation and semantic segmentation performance. We deploy our privacy-preserving semantic segmentation method in a real-world robotic object-goal navigation task, demonstrating successful downstream task execution even under severe visual degradation.

Privacy-Preserving Semantic Segmentation from Ultra-Low-Resolution RGB Inputs

Abstract

RGB-based semantic segmentation has become a mainstream approach for visual perception and is widely applied in a variety of downstream tasks. However, existing methods typically rely on high-resolution RGB inputs, which may expose sensitive visual content in privacy-critical environments. Ultra-low-resolution RGB sensing suppresses sensitive information directly during image acquisition, making it an attractive privacy-preserving alternative. Nevertheless, recovering semantic segmentation from ultra-low-resolution RGB inputs remains highly challenging due to severe visual degradation. In this work, we introduce a novel fully joint-learning framework to mitigate the optimization conflicts exacerbated by visual degradation for ultra-low-resolution semantic segmentation. Experiments demonstrate that our method outperforms representative baselines in semantic segmentation performance and our ultra-low-resolution RGB input achieves a favorable trade-off between privacy preservation and semantic segmentation performance. We deploy our privacy-preserving semantic segmentation method in a real-world robotic object-goal navigation task, demonstrating successful downstream task execution even under severe visual degradation.

Paper Structure

This paper contains 24 sections, 11 equations, 6 figures, 11 tables, 1 algorithm.

Figures (6)

  • Figure 1: Illustration of a real-world application scenario of our privacy-preserving ultra-low-resolution semantic segmentation framework deployed for object-goal navigation. The center panel shows a top-down view of the robot's trajectory (green line) toward the target (red box) in an environment containing privacy-sensitive content (cyan arrows). The High-Res Views show full-resolution RGB images for visualization only, while the Robot Views display the actual robot input: ultra-low-resolution ($16 \times 16$) monocular RGB images. On the left, when the target has not yet been found, the robot performs floor-based navigation by following waypoints (blue dots) along the floor centerline (blue dashed line), turning at branch points that may lead to new rooms. On the right, once the target has been found, the robot switches to goal navigation, gradually approaching the target. As can be seen, our method achieves plausible semantic segmentation from ultra-low-resolution RGBs while benefiting privacy-preserving robotic task execution.
  • Figure 2: Overview of our proposed joint-learning framework for ultra‑low‑resolution semantic segmentation. Given an ULR RGB image, a GAN-based SR branch generates HR RGB, which is then fed into the SS branch (any semantic segmentation network) to predict a semantic map. We integrate an AFE module to extract high-level semantic information and a SAD module to assess the realism of a concatenated super-resolved RGB image and predicted segmentation map. The AFE module takes the SR output and the GT HR as input to compute the feature loss $\mathcal{L}_\mathrm{fea}$. The SAD module receives the concatenated pair to compute the segmentation-aware discrimination loss $\mathcal{L}_\mathrm{D}$ and the adversarial loss $\mathcal{L}_\mathrm{adv}$. The SAD module is trained on the segmentation‐aware discrimination loss $\mathcal{L}_\mathrm{D}$. The entire network with SR and SS branches is jointly trained using the pixel-wise loss $\mathcal{L}_\mathrm{2}$, feature loss $\mathcal{L}_\mathrm{fea}$, adversarial loss $\mathcal{L}_\mathrm{adv}$ and cross-entropy loss $\mathcal{L}_\mathrm{ce}$.
  • Figure 3: Example of credit card in both ($384\times384$) and ULR ($16\times16$) resolution. With the original prompt, the model flags privacy in both cases, driven by shape cues in (b) even though sensitive text is not readable.
  • Figure 4: Visual comparison of super‐resolution RGB images and semantic segmentation maps. We compare the visualization results of super‐resolved outputs and predicted semantic‐segmentation produced by the best five baselines and our proposed method against the GT, under the setting of $16 \times 16$ input. As shown from the visualization result, our method predicts the most accurate semantic segmentation mask compared to all baselines. Our method amplifies object‐boundary contrast to favor region delineation and achieves the best semantic segmentation performance at the expense of boundary segmentation.
  • Figure 5: Trade-off between privacy non-leakage and semantic segmentation performance (measured by mIoU (A) and mAcc (B)) across resolutions. As can be seen, $16\times16$ (Ours-16) provides the most favorable trade-off between privacy preservation and segmentation performance.
  • ...and 1 more figures