A Document-based Knowledge Discovery with Microservices Architecture
Habtom Kahsay Gidey, Mario Kesseler, Patrick Stangl, Peter Hillmann, Andreas Karcher
TL;DR
Facing a deluge of unstructured data in knowledge-intensive settings such as patent offices, this paper presents a document-based knowledge discovery solution built on a microservices architecture. It defines four domain microservices—Document Processing, Querying, Ontology Learning, and Ontology Management—supported by a two-tier data model (internal/external) and a mix of synchronous and asynchronous communications to enable scalable, resilient KD. The approach is evaluated via a demonstrator in a patent-office scenario, showing capabilities in keyword extraction, document similarity, and ontology visualization, with practical RESTful access. The work demonstrates the viability and extensibility ofMSA for KD tasks and outlines future directions including a refined Ontology Learning Layer Cake and decomposing NLP tasks into dedicated microservices.
Abstract
The first step towards digitalization within organizations lies in digitization - the conversion of analog data into digitally stored data. This basic step is the prerequisite for all following activities like the digitalization of processes or the servitization of products or offerings. However, digitization itself often leads to 'data-rich' but 'knowledge-poor' material. Knowledge discovery and knowledge extraction as approaches try to increase the usefulness of digitized data. In this paper, we point out the key challenges in the context of knowledge discovery and present an approach to addressing these using a microservices architecture. Our solution led to a conceptual design focusing on keyword extraction, similarity calculation of documents, database queries in natural language, and programming language independent provision of the extracted information. In addition, the conceptual design provides referential design guidelines for integrating processes and applications for semi-automatic learning, editing, and visualization of ontologies. The concept also uses a microservices architecture to address non-functional requirements, such as scalability and resilience. The evaluation of the specified requirements is performed using a demonstrator that implements the concept. Furthermore, this modern approach is used in the German patent office in an extended version.
