Table of Contents
Fetching ...

General Machine Learning Models for Interpreting and Predicting Efficiency Degradation in Organic Solar Cells

David Valiente, Fernando Rodríguez-Mas, Juan V. Alegre-Requena, David Dalmau, María Flores, Juan C. Ferrer

TL;DR

This work tackles the challenge of modeling temporal degradation of power conversion efficiency ($PCE$) in polymer organic solar cells with a multilayer ITO/PEDOT:PSS/P3HT:PCBM/Al stack. It builds a 166-entry dataset spanning 180+ days across up to seven manufacturing and environmental descriptors and benchmarks an automated ML framework (ROBERT) against classical linear/nonlinear LS approaches. The study finds that ML models, notably random forest with a 90/10 training/validation split (RF-90-10), achieve $R^2$ values above 0.90 on long-term data and can reliably predict $PCE$ for unseen devices (e.g., $R^2 \approx 0.88$, RMSE ≈ 0.021), with feature analyses highlighting PEDOT:PSS solvent content and the P3HT:PCBM ratio as primary drivers. The results demonstrate a scalable, reproducible pathway to screen fabrication variables and anticipate device stability, supported by publicly available data and the ROBERT framework for end-to-end ML benchmarking.

Abstract

This work presents a set of optimal machine learning (ML) models to represent the temporal degradation suffered by the power conversion efficiency (PCE) of polymeric organic solar cells (OSCs) with a multilayer structure ITO/PEDOT:PSS/P3HT:PCBM/Al. To that aim, we generated a database with 996 entries, which includes up to 7 variables regarding both the manufacturing process and environmental conditions for more than 180 days. Then, we relied on a software framework that brings together a conglomeration of automated ML protocols that execute sequentially against our database by simply command-line interface. This easily permits hyper-optimizing and randomizing seeds of the ML models through exhaustive benchmarking so that optimal models are obtained. The accuracy achieved reaches values of the coefficient determination (R2) widely exceeding 0.90, whereas the root mean squared error (RMSE), sum of squared error (SSE), and mean absolute error (MAE)>1% of the target value, the PCE. Additionally, we contribute with validated models able to screen the behavior of OSCs never seen in the database. In that case, R2~0.96-0.97 and RMSE~1%, thus confirming the reliability of the proposal to predict. For comparative purposes, classical Bayesian regression fitting based on non-linear mean squares (LMS) are also presented, which only perform sufficiently for univariate cases of single OSCs. Hence they fail to outperform the breadth of the capabilities shown by the ML models. Finally, thanks to the standardized results offered by the ML framework, we study the dependencies between the variables of the dataset and their implications for the optimal performance and stability of the OSCs. Reproducibility is ensured by a standardized report altogether with the dataset, which are publicly available at Github.

General Machine Learning Models for Interpreting and Predicting Efficiency Degradation in Organic Solar Cells

TL;DR

This work tackles the challenge of modeling temporal degradation of power conversion efficiency () in polymer organic solar cells with a multilayer ITO/PEDOT:PSS/P3HT:PCBM/Al stack. It builds a 166-entry dataset spanning 180+ days across up to seven manufacturing and environmental descriptors and benchmarks an automated ML framework (ROBERT) against classical linear/nonlinear LS approaches. The study finds that ML models, notably random forest with a 90/10 training/validation split (RF-90-10), achieve values above 0.90 on long-term data and can reliably predict for unseen devices (e.g., , RMSE ≈ 0.021), with feature analyses highlighting PEDOT:PSS solvent content and the P3HT:PCBM ratio as primary drivers. The results demonstrate a scalable, reproducible pathway to screen fabrication variables and anticipate device stability, supported by publicly available data and the ROBERT framework for end-to-end ML benchmarking.

Abstract

This work presents a set of optimal machine learning (ML) models to represent the temporal degradation suffered by the power conversion efficiency (PCE) of polymeric organic solar cells (OSCs) with a multilayer structure ITO/PEDOT:PSS/P3HT:PCBM/Al. To that aim, we generated a database with 996 entries, which includes up to 7 variables regarding both the manufacturing process and environmental conditions for more than 180 days. Then, we relied on a software framework that brings together a conglomeration of automated ML protocols that execute sequentially against our database by simply command-line interface. This easily permits hyper-optimizing and randomizing seeds of the ML models through exhaustive benchmarking so that optimal models are obtained. The accuracy achieved reaches values of the coefficient determination (R2) widely exceeding 0.90, whereas the root mean squared error (RMSE), sum of squared error (SSE), and mean absolute error (MAE)>1% of the target value, the PCE. Additionally, we contribute with validated models able to screen the behavior of OSCs never seen in the database. In that case, R2~0.96-0.97 and RMSE~1%, thus confirming the reliability of the proposal to predict. For comparative purposes, classical Bayesian regression fitting based on non-linear mean squares (LMS) are also presented, which only perform sufficiently for univariate cases of single OSCs. Hence they fail to outperform the breadth of the capabilities shown by the ML models. Finally, thanks to the standardized results offered by the ML framework, we study the dependencies between the variables of the dataset and their implications for the optimal performance and stability of the OSCs. Reproducibility is ensured by a standardized report altogether with the dataset, which are publicly available at Github.
Paper Structure (16 sections, 5 equations, 11 figures, 9 tables)

This paper contains 16 sections, 5 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Block diagram of the acquisition system used to acquire the OSC database.
  • Figure 2: Evolution of current density (J) vs. voltage (V) over time (days) for the OSC $Cell2$. (a) Curve J-V with V $\in$[0-0.7] V. (b) Same curve J-V with V $\in$[0-0.25] V.
  • Figure 3: Evolution of the normalized PCE over time for three different OSCs.
  • Figure 4: Accuracy metrics of the LS-supported Bayesian regression fitting over time. Five different models are evaluated: $exp1$$\blacksquare$, $exp2$$\blacksquare$, $gauss1$$\blacksquare$, $gauss2$$\blacksquare$, $poly3$$\blacksquare$. (a) Coefficient of determination, R$^2$. (b) Root mean squared error, RMSE. (c) Sum of squared errors, SSE. (d) Mean absolute error, MAE.
  • Figure 5: PCE predition over time with the best LS-supported Bayesian regression fitting model ($gauss2$). Four temporal datasets are used to compute the fittings: 30 days $-\circ-$; 60 days $-\circ-$; 90 days $-\circ-$ and 120 days $-\circ-$.
  • ...and 6 more figures