Table of Contents
Fetching ...

Bundle Fragments into a Whole: Mining More Complete Clusters via Submodular Selection of Interesting webpages for Web Topic Detection

Junbiao Pang, Anjing Hu, Qingming Huang

TL;DR

A bundling-refining approach to mine more complete hot topics from fragments by leveraging submodular optimization, which outperforms the traditional ranking methods which involve the careful design and complex steps.

Abstract

Organizing interesting webpages into hot topics is one of key steps to understand the trends of multimodal web data. A state-of-the-art solution is firstly to organize webpages into a large volume of multi-granularity topic candidates; hot topics are further identified by estimating their interestingness. However, these topic candidates contain a large number of fragments of hot topics due to both the inefficient feature representations and the unsupervised topic generation. This paper proposes a bundling-refining approach to mine more complete hot topics from fragments. Concretely, the bundling step organizes the fragment topics into coarse topics; next, the refining step proposes a submodular-based method to refine coarse topics in a scalable approach. The propose unconventional method is simple, yet powerful by leveraging submodular optimization, our approach outperforms the traditional ranking methods which involve the careful design and complex steps. Extensive experiments demonstrate that the proposed approach surpasses the state-of-the-art method (i.e., latent Poisson deconvolution Pang et al. (2016)) 20% accuracy and 10% one on two public data sets, respectively.

Bundle Fragments into a Whole: Mining More Complete Clusters via Submodular Selection of Interesting webpages for Web Topic Detection

TL;DR

A bundling-refining approach to mine more complete hot topics from fragments by leveraging submodular optimization, which outperforms the traditional ranking methods which involve the careful design and complex steps.

Abstract

Organizing interesting webpages into hot topics is one of key steps to understand the trends of multimodal web data. A state-of-the-art solution is firstly to organize webpages into a large volume of multi-granularity topic candidates; hot topics are further identified by estimating their interestingness. However, these topic candidates contain a large number of fragments of hot topics due to both the inefficient feature representations and the unsupervised topic generation. This paper proposes a bundling-refining approach to mine more complete hot topics from fragments. Concretely, the bundling step organizes the fragment topics into coarse topics; next, the refining step proposes a submodular-based method to refine coarse topics in a scalable approach. The propose unconventional method is simple, yet powerful by leveraging submodular optimization, our approach outperforms the traditional ranking methods which involve the careful design and complex steps. Extensive experiments demonstrate that the proposed approach surpasses the state-of-the-art method (i.e., latent Poisson deconvolution Pang et al. (2016)) 20% accuracy and 10% one on two public data sets, respectively.
Paper Structure (21 sections, 1 theorem, 20 equations, 8 figures, 2 tables, 2 algorithms)

This paper contains 21 sections, 1 theorem, 20 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Proposition 1

[Diminishing of $g(\mathcal{P})$ in eq:goodnessFunction] The goodness function in eq:goodnessFunction has the following properties:

Figures (8)

  • Figure 1: Bundling fragment topics into a coarse one. In this example, the size of the bundling window is 3.
  • Figure 2: A toy example illustrates Alg. \ref{['alg:optimize-goodness']}.
  • Figure 3: Effectiveness of each component of BR on MCG-WEBV (best viewed in color).
  • Figure 4: The accuracy versus FPPT curves for the combination of MCPD and BR on MCG-WEBV (best viewed in color).
  • Figure 5: Comparisons between the state-of-the-art methods and our method by Top-10 $F_1$ versus NDT on MCG-WEBV (best viewed in color).
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 1
  • Proposition 1
  • proof
  • Remark 1
  • proof