Table of Contents
Fetching ...

Standardizing Knowledge Engineering Practices with a Reference Architecture

Bradley P. Allen, Filip Ilievski

TL;DR

The paper addresses the fragmentation of knowledge engineering (KE) across evolving paradigms by proposing a requirement-driven reference architecture (RA) approach grounded in boxology. It outlines a six-step roadmap to design, synthesize, evaluate, and instantiate RAs for KE, leveraging neurosymbolic patterns from the SWeMLS literature to align user needs with architectural components. Through architectural analysis and a lightweight ATAM, the work demonstrates how to map quality attributes to design patterns and illustrates instantiation with concrete toolkits and an example scenario. The proposed RA framework aims to standardize KE practices, facilitate cross-domain collaboration, and connect KE with software engineering and data science communities, while inviting further community-driven refinement.

Abstract

Knowledge engineering is the process of creating and maintaining knowledge-producing systems. Throughout the history of computer science and AI, knowledge engineering workflows have been widely used given the importance of high-quality knowledge for reliable intelligent agents. Meanwhile, the scope of knowledge engineering, as apparent from its target tasks and use cases, has been shifting, together with its paradigms such as expert systems, semantic web, and language modeling. The intended use cases and supported user requirements between these paradigms have not been analyzed globally, as new paradigms often satisfy prior pain points while possibly introducing new ones. The recent abstraction of systemic patterns into a boxology provides an opening for aligning the requirements and use cases of knowledge engineering with the systems, components, and software that can satisfy them best. This paper proposes a vision of harmonizing the best practices in the field of knowledge engineering by leveraging the software engineering methodology of creating reference architectures. We describe how a reference architecture can be iteratively designed and implemented to associate user needs with recurring systemic patterns, building on top of existing knowledge engineering workflows and boxologies. We provide a six-step roadmap that can enable the development of such an architecture, providing an initial design and outcome of the definition of architectural scope, selection of information sources, and analysis. We expect that following through on this vision will lead to well-grounded reference architectures for knowledge engineering, will advance the ongoing initiatives of organizing the neurosymbolic knowledge engineering space, and will build new links to the software architectures and data science communities.

Standardizing Knowledge Engineering Practices with a Reference Architecture

TL;DR

The paper addresses the fragmentation of knowledge engineering (KE) across evolving paradigms by proposing a requirement-driven reference architecture (RA) approach grounded in boxology. It outlines a six-step roadmap to design, synthesize, evaluate, and instantiate RAs for KE, leveraging neurosymbolic patterns from the SWeMLS literature to align user needs with architectural components. Through architectural analysis and a lightweight ATAM, the work demonstrates how to map quality attributes to design patterns and illustrates instantiation with concrete toolkits and an example scenario. The proposed RA framework aims to standardize KE practices, facilitate cross-domain collaboration, and connect KE with software engineering and data science communities, while inviting further community-driven refinement.

Abstract

Knowledge engineering is the process of creating and maintaining knowledge-producing systems. Throughout the history of computer science and AI, knowledge engineering workflows have been widely used given the importance of high-quality knowledge for reliable intelligent agents. Meanwhile, the scope of knowledge engineering, as apparent from its target tasks and use cases, has been shifting, together with its paradigms such as expert systems, semantic web, and language modeling. The intended use cases and supported user requirements between these paradigms have not been analyzed globally, as new paradigms often satisfy prior pain points while possibly introducing new ones. The recent abstraction of systemic patterns into a boxology provides an opening for aligning the requirements and use cases of knowledge engineering with the systems, components, and software that can satisfy them best. This paper proposes a vision of harmonizing the best practices in the field of knowledge engineering by leveraging the software engineering methodology of creating reference architectures. We describe how a reference architecture can be iteratively designed and implemented to associate user needs with recurring systemic patterns, building on top of existing knowledge engineering workflows and boxologies. We provide a six-step roadmap that can enable the development of such an architecture, providing an initial design and outcome of the definition of architectural scope, selection of information sources, and analysis. We expect that following through on this vision will lead to well-grounded reference architectures for knowledge engineering, will advance the ongoing initiatives of organizing the neurosymbolic knowledge engineering space, and will build new links to the software architectures and data science communities.
Paper Structure (18 sections, 4 figures, 2 tables)

This paper contains 18 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Pipeline for devising an RA for KE. First, we identify the scope by defining stakeholders and use cases, ultimately resulting in a set of quality attributes and functional requirements allen2023identifying. Second, we select and investigate information sources, according to the SWeMLS corpus of neurosymbolic systems and patterns for KE ekaputra2023describingsabouknowledge. Third, we connect these components through architectural analysis, yielding information about the fit of various patterns for requirements and use cases. Based on these insights, the fourth step synthesizes an RA from these patterns. Fifth, the RA is evaluated through instantiation and use using a standard software architecture methodology. Finally, the RA is instantiated into software.
  • Figure 2: Simple neurosymbolic system design patterns from the SWeMLS KG, as shown in waltersdorfer2023semantic. The F2 design pattern, appearing on the right of the figure, is a simple fusion that takes both symbolic (s) and unstructured data (d) as inputs and produces symbolic data (s) as output using a model M.
  • Figure 3: Preliminary analysis of the relationships between quality attributes for KE identified in allen2023identifying and the KE design patterns from sabouknowledge that are associated with knowledge graph creation and extension. The number in each cell is the count of occurrences of the quality attributes assigned to papers by the zero-shot text classifier that describes systems with the given pattern.
  • Figure 4: An example of RA synthesis. Stakeholders have identified a set of QAs (scalability, domain-specificity, and extensibility) and a specific use case (graph extension of an enterprise KG). The team of architects has taken this as input, selected an adequate pattern F2 (fusion 2) based on its support for the indicated QAs and use case, and synthesized a proposed RA that uses a trained subject classifier to perform graph extension based on the KG and data from a content repository. After a process of iterative refinement, choices are made about specific technologies to use, and a concrete RA is proposed.