Table of Contents
Fetching ...

When Graph Convolution Meets Double Attention: Online Privacy Disclosure Detection with Multi-Label Text Classification

Zhanbo Liang, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li

TL;DR

This work tackles online privacy disclosure detection by casting it as a multi-label text classification task. It introduces a novel architecture that unifies the input text, label-to-text correlations via a double-attention module, and label-to-label correlations via a two-layer GCN-guided feature fusion with compensation coefficients. Empirical results on a Twitter privacy dataset with 32 categories show consistent improvements over state-of-the-art MLTC methods, validating the effectiveness of incorporating label interactions and graph-based label relations. The approach offers a principled pathway toward finer-grained privacy-protection tools in social media, though it is evaluated on a single dataset and focuses on text-only data, pointing to multi-modal extensions as future work.

Abstract

With the rise of Web 2.0 platforms such as online social media, people's private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network (GCN) is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators.

When Graph Convolution Meets Double Attention: Online Privacy Disclosure Detection with Multi-Label Text Classification

TL;DR

This work tackles online privacy disclosure detection by casting it as a multi-label text classification task. It introduces a novel architecture that unifies the input text, label-to-text correlations via a double-attention module, and label-to-label correlations via a two-layer GCN-guided feature fusion with compensation coefficients. Empirical results on a Twitter privacy dataset with 32 categories show consistent improvements over state-of-the-art MLTC methods, validating the effectiveness of incorporating label interactions and graph-based label relations. The approach offers a principled pathway toward finer-grained privacy-protection tools in social media, though it is evaluated on a single dataset and focuses on text-only data, pointing to multi-modal extensions as future work.

Abstract

With the rise of Web 2.0 platforms such as online social media, people's private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network (GCN) is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators.
Paper Structure (31 sections, 7 equations, 6 figures, 6 tables)

This paper contains 31 sections, 7 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Several illustrative examples of possible privacy disclosures on online social media platforms.
  • Figure 2: The architecture of the proposed network.
  • Figure 3: Construction of the initial weighted label graph.
  • Figure 4: Illustration of 32 categories of privacy used in our experiments.
  • Figure 5: The visualization of label attention weights.
  • ...and 1 more figures