Combining knowledge graphs and LLMs for hazardous chemical information management and reuse
Marcos Da Silveira, Louis Deladiennee, Kheira Acem, Oona Freudenthal
TL;DR
The paper tackles the challenge of inaccessible hazardous chemical information by evaluating the FAIRness of major data sources and proposing HazardChat, a knowledge-graph based platform that combines a visual Neo4J interface with a retrieval augmented language model chatbot. It demonstrates how a KG linked from data sources like ECHA, CTD, and NIOSH can enable rapid, query-driven insights for healthcare professionals. Key contributions include a cross-source FAIR analysis, a practical KG design for hazard-disease mapping, and two user interfaces that lower technical barriers to data access. The work highlights the practical impact of integrating FAIR data, knowledge graphs, and LLMs to improve health risk assessment and decision making, while outlining persistent gaps in data standardization and interoperability that require governance and API level solutions.
Abstract
Human health is increasingly threatened by exposure to hazardous substances, particularly persistent and toxic chemicals. The link between these substances, often encountered in complex mixtures, and various diseases are demonstrated in scientific studies. However, this information is scattered across several sources and hardly accessible by humans and machines. This paper evaluates current practices for publishing/accessing information on hazardous chemicals and proposes a novel platform designed to facilitate retrieval of critical chemical data in urgent situations. The platform aggregates information from multiple sources and organizes it into a structured knowledge graph. Users can access this information through a visual interface such as Neo4J Bloom and dashboards, or via natural language queries using a Chatbot. Our findings demonstrate a significant reduction in the time and effort required to access vital chemical information when datasets follow FAIR principles. Furthermore, we discuss the lessons learned from the development and implementation of this platform and provide recommendations for data owners and publishers to enhance data reuse and interoperability. This work aims to improve the accessibility and usability of chemical information by healthcare professionals, thereby supporting better health outcomes and informed decision-making in the face of patients exposed to chemical intoxication risks.
