Learning Privacy from Visual Entities

Alessio Xompero; Andrea Cavallaro

Learning Privacy from Visual Entities

Alessio Xompero, Andrea Cavallaro

TL;DR

The paper investigates how to predict image privacy and challenges the necessity of large graph-based models by showing that a transfer-learning pipeline using a pre-trained CNN and a single trainable FC layer can match the performance of graph-based methods. It critically analyzes the components of graph-based privacy classifiers, demonstrating that fine-tuning CNNs largely drives performance while the graph component contributes minimally and at great parameter cost. By comparing against GIP, GPA, MLP, and GA-MLP across IPD and PrivacyAlert, it highlights that a lightweight approach (S2P) achieves comparable or superior accuracy with orders of magnitude fewer trainable parameters. The findings imply practical benefits in efficiency and interpretability, and suggest future work should focus on human-interpretable visual-entity features and more efficient graph designs to improve privacy recognition at scale.

Abstract

Subjective interpretation and content diversity make predicting whether an image is private or public a challenging task. Graph neural networks combined with convolutional neural networks (CNNs), which consist of 14,000 to 500 millions parameters, generate features for visual entities (e.g., scene and object types) and identify the entities that contribute to the decision. In this paper, we show that using a simpler combination of transfer learning and a CNN to relate privacy with scene types optimises only 732 parameters while achieving comparable performance to that of graph-based methods. On the contrary, end-to-end training of graph-based methods can mask the contribution of individual components to the classification performance. Furthermore, we show that a high-dimensional feature vector, extracted with CNNs for each visual entity, is unnecessary and complexifies the model. The graph component has also negligible impact on performance, which is driven by fine-tuning the CNN to optimise image features for privacy nodes.

Learning Privacy from Visual Entities

TL;DR

Abstract

Learning Privacy from Visual Entities

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)