Table of Contents
Fetching ...

LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain

Antonio De Santis, Marco Balduini, Matteo Belcao, Andrea Proia, Marco Brambilla, Emanuele Della Valle

Abstract

Large manufacturing companies face challenges in information retrieval due to data silos maintained by different departments, leading to inconsistencies and misalignment across databases. This paper presents an experience in integrating and retrieving qualification data for electronic components used in satellite board design. Due to data silos, designers cannot immediately determine the qualification status of individual components. However, this process is critical during the planning phase, when assembly drawings are issued before production, to optimize new qualifications and avoid redundant efforts. To address this, we propose a pipeline that uses Virtual Knowledge Graphs for a unified view over heterogeneous data sources and LLMs to enhance retrieval and reduce manual effort in data cleansing. The retrieval of qualifications is then performed through an Ontology-based Data Access approach for structured queries and a vector search mechanism for retrieving qualifications based on similar textual properties. We perform a comparative cost-benefit analysis, demonstrating that the proposed pipeline also outperforms approaches relying solely on LLMs, such as Retrieval-Augmented Generation (RAG), in terms of long-term efficiency.

LLM-Enhanced Semantic Data Integration of Electronic Component Qualifications in the Aerospace Domain

Abstract

Large manufacturing companies face challenges in information retrieval due to data silos maintained by different departments, leading to inconsistencies and misalignment across databases. This paper presents an experience in integrating and retrieving qualification data for electronic components used in satellite board design. Due to data silos, designers cannot immediately determine the qualification status of individual components. However, this process is critical during the planning phase, when assembly drawings are issued before production, to optimize new qualifications and avoid redundant efforts. To address this, we propose a pipeline that uses Virtual Knowledge Graphs for a unified view over heterogeneous data sources and LLMs to enhance retrieval and reduce manual effort in data cleansing. The retrieval of qualifications is then performed through an Ontology-based Data Access approach for structured queries and a vector search mechanism for retrieving qualifications based on similar textual properties. We perform a comparative cost-benefit analysis, demonstrating that the proposed pipeline also outperforms approaches relying solely on LLMs, such as Retrieval-Augmented Generation (RAG), in terms of long-term efficiency.
Paper Structure (7 sections, 1 equation, 4 figures, 2 tables)

This paper contains 7 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the proposed LLM-enhanced semantic integration pipeline. The approach integrates two heterogeneous data silos (PLM-DB and QC) by combining LLMs and VKGs. LLMs assist in semi-automated data cleaning, including normalization and extraction of key fields, while knowledge engineering is performed once to create a semantic layer over the data. Direct and by-similarity qualifications are retrieved via exact symbolic queries over the VKG, whereas alternative qualifications are retrieved through vector search and refined using rules based on the component type and finally validated by a domain expert.
  • Figure 2: Overview of the pipeline for semi-automated extraction of Part Numbers (PN) from textual descriptions using LLMs. Textual entries, potentially containing Part Numbers, are first processed by an LLM that automatically extracts candidate PNs. These candidates are subsequently cross-checked against the PLM-DB to validate correctness. Entries that cannot be verified through this step are flagged and sent to human domain experts for manual review, including those for which a candidate PN could not be extracted (indicated as NA). Successfully validated PNs are then added to QC.
  • Figure 3: High-level overview of the VKG semantic model for qualification retrieval. It shows the two main entities (PLMDB_COMPONENT and QUALIFICATION_CARD) and the core join attributes (PN, manufacturer, package and subpackage) used for direct and by-similarity matching. Additional domain-specific properties are omitted for readability.
  • Figure 4: Cumulative effort in person-days required for each approach as a function of the number of electronic components, considering system setup time and human-in-the-loop validation. The top figure shows the absolute effort of the three pipelines, while the bottom figure shows the relative effort compared to the manual (AS-IS) baseline. The VKG+LLM pipeline requires higher setup time but becomes significantly more efficient at scale, reducing effort by over 70% beyond 5000 components.