Table of Contents
Fetching ...

InSpecLearn4SDL: Interpretable Spectral Features Predict Conductivity in Self-Driving Doped Conjugated Polymer Labs

Ankush Kumar Mishra, Jacob P. Mauthe, Nicholas Luke, Aram Amassian, Baskar Ganapathysubramanian

TL;DR

This work develops an interpretable QSPR framework that predicts electrical conductivity of doped conjugated polymers directly from rapid optical spectra. By combining AUC-based spectral featurization with a GA to locate informative spectral windows, SHAP-guided feature selection, and domain-knowledge feature expansion, the authors build a data-driven surrogate that matches expert descriptors and, when fused with expert features, delivers the best predictive performance. The approach yields an $R^2$ of about 0.85 on test data and suggests substantial reductions in experimental burden, supporting autonomous decision-making in self-driving labs. Although demonstrated on a single polymer–dopant system, the methodology is generalizable to other spectral modalities and materials spaces, offering a scalable path to accelerate materials discovery with spectroscopy-guided design.

Abstract

To accelerate materials discovery using self-driving labs (SDLs), we present a machine learning pipeline that predicts the electrical conductivity of doped conjugated polymers using rapid, non-destructive optical spectroscopy. Our approach automates spectral featurization by combining a genetic algorithm with adaptive area-under-the-curve (AUC) computations, creating a quantitative structure-property relationship (QSPR) that links optical response and processing parameters to conductivity. By incorporating SHAP-guided selection and domain-knowledge-based feature expansion, the model matches expert-curated performance while theoretically reducing experimental effort by $\sim 33\%$ by minimizing the need for costly direct conductivity measurements. Notably, the model recovers known physical descriptors in pBTTT and identifies informative tail-state regions correlated with polymer bleaching upon successful doping. This generic, interpretable, small-data-friendly methodology can be extended to other spectroscopic modalities, such as Raman or FTIR, providing a framework for autonomous decision-making in SDLs.

InSpecLearn4SDL: Interpretable Spectral Features Predict Conductivity in Self-Driving Doped Conjugated Polymer Labs

TL;DR

This work develops an interpretable QSPR framework that predicts electrical conductivity of doped conjugated polymers directly from rapid optical spectra. By combining AUC-based spectral featurization with a GA to locate informative spectral windows, SHAP-guided feature selection, and domain-knowledge feature expansion, the authors build a data-driven surrogate that matches expert descriptors and, when fused with expert features, delivers the best predictive performance. The approach yields an of about 0.85 on test data and suggests substantial reductions in experimental burden, supporting autonomous decision-making in self-driving labs. Although demonstrated on a single polymer–dopant system, the methodology is generalizable to other spectral modalities and materials spaces, offering a scalable path to accelerate materials discovery with spectroscopy-guided design.

Abstract

To accelerate materials discovery using self-driving labs (SDLs), we present a machine learning pipeline that predicts the electrical conductivity of doped conjugated polymers using rapid, non-destructive optical spectroscopy. Our approach automates spectral featurization by combining a genetic algorithm with adaptive area-under-the-curve (AUC) computations, creating a quantitative structure-property relationship (QSPR) that links optical response and processing parameters to conductivity. By incorporating SHAP-guided selection and domain-knowledge-based feature expansion, the model matches expert-curated performance while theoretically reducing experimental effort by by minimizing the need for costly direct conductivity measurements. Notably, the model recovers known physical descriptors in pBTTT and identifies informative tail-state regions correlated with polymer bleaching upon successful doping. This generic, interpretable, small-data-friendly methodology can be extended to other spectroscopic modalities, such as Raman or FTIR, providing a framework for autonomous decision-making in SDLs.

Paper Structure

This paper contains 23 sections, 3 equations, 20 figures, 8 tables.

Figures (20)

  • Figure 1: Workflow for generating a QSPR model that maps optical spectra and processing conditions to electrical conductivity. Spectral features are extracted using the area under the curve (AUC), and key regions are identified using a genetic algorithm. These features are used to train the initial model, QSPR 1. To enhance performance, mathematical operations are applied to expand the feature set, resulting in QSPR 2. Feature importance is then assessed, and greedy forward selection is employed to identify a compact, high-performing subset, termed data-driven features, yielding QSPR 3. Expert-curated features are subsequently incorporated to develop the final QSPR model. In the absence of expert input, QSPR 3 serves as the final model. The data-driven features are also interpreted and benchmarked against expert-selected features.
  • Figure 2: Materials acceleration platform (MAP) used for preparation of polymer films, highlighting the robotic sample manipulation, multi-sample cassette, computer-controlled spin coater, and heated vial storage.
  • Figure 3: Workflow for processing, doping, and characterizing a batch of doped conjugated polymer films. The steps include solution preparation, film coating, sequential spectroscopic measurements, annealing, doping, and final conductivity characterization. The timeline for each step is shown for a batch of 32 samples, highlighting that conductivity measurements are the most time-consuming stage.
  • Figure 4: Data distribution analysis using clustering, KS test, and KDE plots. (a) Elbow method for selecting the optimal number of clusters. The plot displays the within-cluster sum of squares (WCSS) against the number of clusters. The "elbow" point, where the rate of decrease in WCSS slows down, indicates the optimal number of clusters (b) Kernel Density Estimation (KDE) plots comparing the distributions of processing conditions and conductivity training and test datasets.
  • Figure 5: Featurization of optical spectra for conductivity prediction in doped conjugated polymers. Peak and valley-based features are sensitive to noise, whereas binning followed by calculating the area under the curve offers a more noise-robust approach.
  • ...and 15 more figures