T-curator: a trust based curation tool for LOD logs
Dihia Lanasri
TL;DR
This work tackles the risk of using SPARQL query logs from Linked Open Data by introducing T-Curator, a trust-based curation tool. It profiles logs and applies an ETL-like pipeline of trust-aware operators, organized within a three-tier MVC architecture (Presentation, Business, Data) and implemented with Java/Scala and Jena. The approach yields measurable improvements in log trust, demonstrated on ScholarlyData and DBpedia logs, with the rate of trust rising from $79\%$ to $95.16\%$ and the curated set shrinking from $139{,}932$ to $6{,}756$ trusted queries. The tool provides an interactive GUI for analysts to compose pipelines, offers detailed statistics after each operation, and aligns with prior trust-focused work to enable safer reuse of LOD logs for decision making.
Abstract
Nowadays, companies are racing towards Linked Open Data (LOD) to improve their added value, but they are ignoring their SPARQL query logs. If well curated, these logs can present an asset for decision makers. A naive and straightforward use of these logs is too risky because their provenance and quality are highly questionable. Users of these logs in a trusted way have to be assisted by providing them with in-depth knowledge of the whole LOD environment and tools to curate these logs. In this paper, we propose an interactive and intuitive trust based tool that can be used to curate these LOD logs before exploiting them. This tool is proposed to support our approach proposed in our previous work Lanasri et al. [2020].
