Language and Knowledge Representation: A Stratified Approach
Mayukh Bagchi
TL;DR
The work reframes semantic heterogeneity as representation heterogeneity across concept, language, knowledge, and data layers, and proposes a top-down, stratified framework to address it. Central to the approach are the Universal Knowledge Core (UKC) for language representation, language teleontology and knowledge teleontology for knowledge representation, the LiveKnowledge catalog for iterative reuse, and the kTelos methodology for integrating these components into reusable language/knowledge representations. The framework is demonstrated through proof-of-concept case studies on DataScientia and JIDEP, illustrating how multilingual data catalogs and materials modelling benefit from structured, domain-grounded representations. The model aims to enable more interoperable, reusable, and scalable knowledge graphs by systematically managing unity and diversity at multiple representation levels, with potential implications for data integration, semantic search, and knowledge-based AI tooling.
Abstract
The thesis proposes the problem of representation heterogeneity to emphasize the fact that heterogeneity is an intrinsic property of any representation, wherein, different observers encode different representations of the same target reality in a stratified manner using different concepts, language and knowledge (as well as data). The thesis then advances a top-down solution approach to the above stratified problem of representation heterogeneity in terms of several solution components, namely: (i) a representation formalism stratified into concept level, language level, knowledge level and data level to accommodate representation heterogeneity, (ii) a top-down language representation using Universal Knowledge Core (UKC), UKC namespaces and domain languages to tackle the conceptual and language level heterogeneity, (iii) a top-down knowledge representation using the notions of language teleontology and knowledge teleontology to tackle the knowledge level heterogeneity, (iv) the usage and further development of the existing LiveKnowledge catalog for enforcing iterative reuse and sharing of language and knowledge representations, and, (v) the kTelos methodology integrating the solution components above to iteratively generate the language and knowledge representations absolving representation heterogeneity. The thesis also includes proof-of-concepts of the language and knowledge representations developed for two international research projects - DataScientia (data catalogs) and JIDEP (materials modelling). Finally, the thesis concludes with future lines of research.
