InSpecLearn4SDL: Interpretable Spectral Features Predict Conductivity in Self-Driving Doped Conjugated Polymer Labs
Ankush Kumar Mishra, Jacob P. Mauthe, Nicholas Luke, Aram Amassian, Baskar Ganapathysubramanian
TL;DR
This work develops an interpretable QSPR framework that predicts electrical conductivity of doped conjugated polymers directly from rapid optical spectra. By combining AUC-based spectral featurization with a GA to locate informative spectral windows, SHAP-guided feature selection, and domain-knowledge feature expansion, the authors build a data-driven surrogate that matches expert descriptors and, when fused with expert features, delivers the best predictive performance. The approach yields an $R^2$ of about 0.85 on test data and suggests substantial reductions in experimental burden, supporting autonomous decision-making in self-driving labs. Although demonstrated on a single polymer–dopant system, the methodology is generalizable to other spectral modalities and materials spaces, offering a scalable path to accelerate materials discovery with spectroscopy-guided design.
Abstract
To accelerate materials discovery using self-driving labs (SDLs), we present a machine learning pipeline that predicts the electrical conductivity of doped conjugated polymers using rapid, non-destructive optical spectroscopy. Our approach automates spectral featurization by combining a genetic algorithm with adaptive area-under-the-curve (AUC) computations, creating a quantitative structure-property relationship (QSPR) that links optical response and processing parameters to conductivity. By incorporating SHAP-guided selection and domain-knowledge-based feature expansion, the model matches expert-curated performance while theoretically reducing experimental effort by $\sim 33\%$ by minimizing the need for costly direct conductivity measurements. Notably, the model recovers known physical descriptors in pBTTT and identifies informative tail-state regions correlated with polymer bleaching upon successful doping. This generic, interpretable, small-data-friendly methodology can be extended to other spectroscopic modalities, such as Raman or FTIR, providing a framework for autonomous decision-making in SDLs.
