SciQu: Accelerating Materials Properties Prediction with Automated Literature Mining for Self-Driving Laboratories
Anand Babu
TL;DR
SciQu addresses the bottleneck in materials discovery caused by the exponential growth of literature by automating data extraction from publications and combining it with ML-driven property prediction. The approach employs text splitting, multiple embeddings, and vector databases to enable scalable retrieval, feeding a Random Forest predictor that, in a pilot, forecasts the refractive index with $RMSE\approx0.068$ and $R^{2}\approx0.94$ using descriptors like space group, volume, and bandgap. The workflow integrates automated literature mining with self-driving laboratory concepts to optimize synthesis parameters and generate reproducible experimental outcomes. This work demonstrates a scalable framework that can accelerate autonomous materials discovery and reduction of manual review effort, with a publicly available codebase for reproducibility and extension.
Abstract
Assessing different material properties to predict specific attributes, such as band gap, resistivity, young modulus, work function, and refractive index, is a fundamental requirement for materials science-based applications. However, the process is time-consuming and often requires extensive literature reviews and numerous experiments. Our study addresses these challenges by leveraging machine learning to analyze material properties with greater precision and efficiency. By automating the data extraction process and using the extracted information to train machine learning models, our developed model, SciQu, optimizes material properties. As a proof of concept, we predicted the refractive index of materials using data extracted from numerous research articles with SciQu, considering input descriptors such as space group, volume, and bandgap with Root Mean Square Error (RMSE) 0.068 and R2 0.94. Thus, SciQu not only predicts the properties of materials but also plays a key role in self-driving laboratories by optimizing the synthesis parameters to achieve precise shape, size, and phase of the materials subjected to the input parameters.
