Table of Contents
Fetching ...

ShennongAlpha: an AI-driven sharing and collaboration platform for intelligent curation, acquisition, and translation of natural medicinal material knowledge

Zijie Yang, Yongjing Yin, Chaojun Kong, Tiange Chi, Wufan Tao, Yue Zhang, Tian Xu

TL;DR

ShennongAlpha presents an AI-driven platform to tackle the lack of standardized nomenclature, curation, and translation for Natural Medicinal Materials (NMMs) by introducing Systematic Nomenclature (NMMSN) and a bilingual, collaborative knowledge base. The system integrates an open naming tool (ShennongName), multilingual knowledge management (MLMD), a retrieval-augmented chat interface (ShennongChat), and a standardized translation pipeline (NMT-CPT) to enable accurate, interpretable cross-language access to >$14{,}000$ NMM entries. Key innovations include the NMMSN encoding with a 4-digit base-36 ID ($36^4 - 1 = 1{,}679{,}615$ max entries), a five-layer ShennongAlpha architecture, and a CGS-driven coreference graph search for consistent term mapping. The approach demonstrates how standardized nomenclature, robust search, and retrieval-augmented generation can transform domain-specific knowledge sharing, reduce mistranslations, and broaden global access to NMM knowledge for researchers, clinicians, and patients. This work offers a scalable model for AI-assisted knowledge sharing in specialized biomedical domains and provides tools and data to support future LLM training and cross-cultural dissemination of medicinal knowledge.

Abstract

Natural Medicinal Materials (NMMs) have a long history of global clinical applications and a wealth of records and knowledge. Although NMMs are a major source for drug discovery and clinical application, the utilization and sharing of NMM knowledge face crucial challenges, including the standardized description of critical information, efficient curation and acquisition, and language barriers. To address these, we developed ShennongAlpha, an AI-driven sharing and collaboration platform for intelligent knowledge curation, acquisition, and translation. For standardized knowledge curation, the platform introduced a Systematic Nomenclature to enable accurate differentiation and identification of NMMs. More than fourteen thousand Chinese NMMs have been curated into the platform along with their knowledge. Furthermore, the platform pioneered chat-based knowledge acquisition, standardized machine translation, and collaborative knowledge updating. Together, our study represents the first major advance in leveraging AI to empower NMM knowledge sharing, which not only marks a novel application of AI for Science, but also will significantly benefit the global biomedical, pharmaceutical, physician, and patient communities.

ShennongAlpha: an AI-driven sharing and collaboration platform for intelligent curation, acquisition, and translation of natural medicinal material knowledge

TL;DR

ShennongAlpha presents an AI-driven platform to tackle the lack of standardized nomenclature, curation, and translation for Natural Medicinal Materials (NMMs) by introducing Systematic Nomenclature (NMMSN) and a bilingual, collaborative knowledge base. The system integrates an open naming tool (ShennongName), multilingual knowledge management (MLMD), a retrieval-augmented chat interface (ShennongChat), and a standardized translation pipeline (NMT-CPT) to enable accurate, interpretable cross-language access to > NMM entries. Key innovations include the NMMSN encoding with a 4-digit base-36 ID ( max entries), a five-layer ShennongAlpha architecture, and a CGS-driven coreference graph search for consistent term mapping. The approach demonstrates how standardized nomenclature, robust search, and retrieval-augmented generation can transform domain-specific knowledge sharing, reduce mistranslations, and broaden global access to NMM knowledge for researchers, clinicians, and patients. This work offers a scalable model for AI-assisted knowledge sharing in specialized biomedical domains and provides tools and data to support future LLM training and cross-cultural dissemination of medicinal knowledge.

Abstract

Natural Medicinal Materials (NMMs) have a long history of global clinical applications and a wealth of records and knowledge. Although NMMs are a major source for drug discovery and clinical application, the utilization and sharing of NMM knowledge face crucial challenges, including the standardized description of critical information, efficient curation and acquisition, and language barriers. To address these, we developed ShennongAlpha, an AI-driven sharing and collaboration platform for intelligent knowledge curation, acquisition, and translation. For standardized knowledge curation, the platform introduced a Systematic Nomenclature to enable accurate differentiation and identification of NMMs. More than fourteen thousand Chinese NMMs have been curated into the platform along with their knowledge. Furthermore, the platform pioneered chat-based knowledge acquisition, standardized machine translation, and collaborative knowledge updating. Together, our study represents the first major advance in leveraging AI to empower NMM knowledge sharing, which not only marks a novel application of AI for Science, but also will significantly benefit the global biomedical, pharmaceutical, physician, and patient communities.
Paper Structure (47 sections, 6 equations, 35 figures, 2 tables, 2 algorithms)

This paper contains 47 sections, 6 equations, 35 figures, 2 tables, 2 algorithms.

Figures (35)

  • Figure 1: ShennongAlpha: an AI-driven sharing and collaboration platform for intelligent curation, acquisition, and translation of NMM knowledge. a. The challenges for utilizing and sharing NMM knowledge and our ShennongAlpha approaches. b. Architecture of the ShennongAlpha. ShennongAlpha applies the Systematic Nomenclature for NMMs (\ref{['fig:snnmm']}) and integrates ShennongName (\ref{['fig:snn']}) with a hexa-domain modular system to form its structure. The hexa-domain system is outlined in the light teal block. ShennongAlpha is structured into five layers, from shallow to deep: Layer 1: Web and user interaction layer. In this layer, users can access the system via our ShennongAlpha Web (\ref{['fig:sna']}). Layer 2: Algorithm and application layer. In this layer, we have specifically developed three applications customized for NMMs: ShennongName, ShennongChat (\ref{['fig:snc']}), and ShennongTranslate (\ref{['fig:snt']}). Users can access these applications on the corresponding pages of the ShennongAlpha Web. Layer 3: Artificial intelligence layer. In this layer, we have integrated the ShennongAlpha Large Language Model system, allowing the ShennongAlpha to process and respond to data from different layers intelligently. Layer 4: Search engine layer. In this layer, we have integrated the ShennongAlpha Search Engine customized for NMM-related data. Layer 5: Knowledge base layer. In this layer, we have integrated the ShennongAlpha Knowledge Base to curate NMM knowledge efficiently. Arrows represent the allowed data interactions between different layers. c. Cross-platform and user-friendly design of the ShennongAlpha. d. The English homepage of the ShennongAlpha Web. e. The Chinese homepage of the ShennongAlpha Web.
  • Figure 2: Systematic Nomenclature for Natural Medicinal Materials. The Systematic Nomenclature assigns each NMM a unique Systematic Name, Generic Name, and NMM ID. a. Components of the Systematic Name. It consists of four components: I. Species origin, including species names in Latin; II. Medicinal part; III. Special description for initial preparations or specific characteristics; and IV. Processing method. b. NMM Types. Raw NMMs are initially prepared at the production sites to produce Agricultural NMMs; Agricultural NMMs are often further processed to produce Processed NMMs. c. Examples of traditional Chinese NMMs in Systematic Nomenclature. Conventional names often lead to confusion by collectively referring to multiple NMMs that are not identical, due to missing or incorrect information about species origin, medicinal part, special description, and processing method. For example, the illustration shows three Agricultural NMMs from the Ephedra genus with the herbaceous stem as the medicinal part, conventionally named "Ephedrae Herba" ("麻黄"), leading to ambiguity. Similarly, nine Processed NMMs from the Curcuma genus, with different medicinal parts, initial preparations and processing methods, are collectively referred to by four names: "Wenyujin Rhizoma Concisum" ("片姜黄"), "Curcumae Rhizoma" ("莪术"), "Curcumae Radix" ("郁金"), and "Curcumae Longae Rhizoma" ("姜黄"). In contrast, our Systematic Nomenclature accurately assigns distinct Systematic Names, Generic Names, and NMM IDs to these twelve different Agricultural and Processed NMMs, eliminating ambiguity. The dashed lines connect the conventional names to the different NMMs they collectively represent.
  • Figure 3: Using ShennongName to automatically construct NMM Systematic Names. Users select the NMM type and provide information for the four name components in area *X. By clicking on hyperlinks like *X, users can view entries for the four name components already cataloged in the ShennongAlpha Knowledge Base. For each name component, users can add additional information by clicking on plus buttons like *X (*X'). When users begin entering name component information in text boxes like *X, ShennongName performs a real-time search in the Knowledge Base for relevant matching entries to enable auto-completion (*X'). After users have entered the necessary naming information for the NMM, they can click on the "Construct NMM Systematic Name" button (*X), allowing ShennongName to automatically construct the Systematic Name using the algorithm. If the construction is successful, the generated information is displayed with a green background (*X). If any issues arise during construction, the relevant information is displayed with an orange background (*X'). For successfully constructed Systematic Names, ShennongName will also automatically perform a search for it in the Knowledge Base; if a matching NMM is found, users will be informed that the NMM is already recorded in the Knowledge Base, eliminating the need for redundant construction (*X). After successfully constructing a new Systematic Name, if users wish to add it to the Knowledge Base, they can provide relevant details about the NMM in the textbox in area *X and submit it. Once reviewed and approved by ShennongAlpha, the entry will be incorporated into the Knowledge Base.
  • Figure 4: Browsing NMM knowledge on the ShennongAlpha Web. (Legend continued on next page.)
  • Figure 5: (Legend continued from the previous page.) Users can initiate their exploration of NMM knowledge by using the search bar located either on the homepage (*X) or atop other pages (*X). Post-search, users are directed to the search page where they can glance through the title (*X) and summarized information (*X) of each NMM entry to ascertain its relevance. By clicking on the title of an entry, users are navigated to a detailed knowledge page dedicated to that specific NMM. The header of this knowledge page (*X) displays the Systematic Name of the NMM, while the main content area (*X) is organized in a structured, section-by-section layout. The "Table of Contents" sidebar (*X) enables swift navigation between sections. To facilitate cross-language accessibility for global users, the Web offers four display modes (*X): Bilingual (Chinese-English), Bilingual (English-Chinese), Chinese only, and English only. The "Save" button (*X) allows users to bookmark the knowledge page to their user dashboard (*X). To encourage academic references, the "Cite" button (*X) offers citation formats in styles such as APA, MLA, GB/T 7714-2015, and BibTeX. The "Download" button (*X) enables users to download the knowledge page's content in JSON format. Furthermore, with the "Leave your name and knowledge!" button (*X), users can propose new or revised NMM-related knowledge. Contributions can also be made directly via the "Edit Content" button (*X), allowing users to modify the content of each section. To review past modifications, the "Edit History" button (*X) provides access to all historical changes. Approved user contributions are then integrated into the ShennongAlpha Knowledge Base, and contributors are recognized and acknowledged in the "Contributors" area (*X), where their usernames and avatars are displayed. Users can navigate to the ShennongChat (*X, \ref{['fig:snc']}), ShennongTranslate (*X, \ref{['fig:snt']}), and ShennongName (*X, \ref{['fig:snn']}) applications in ShennongAlpha, as well as the detailed rules of the Systematic Nomenclature for NMMs (*X), directly from the homepage or through the header navigation bar on any page of the Web.
  • ...and 30 more figures