Table of Contents
Fetching ...

Validation of Semi-Empirical xTB Methods for High-Throughput Screening of TADF Emitters: A 747-Molecule Benchmark Study

Jean-Pierre Tchapet Njafa, Elvira Vanelle Kameni Tcheuffa, Aissatou Maghame, Serge Guy Nana Engo

TL;DR

This work addresses the need for scalable screening of TADF emitters by validating semi-empirical xTB methods (sTDA-xTB and sTD-DFT-xTB) on a 747-molecule benchmark. By coupling GFN2-xTB ground-state geometries with rapid excited-state calculations and including implicit solvent effects, the authors achieve >$99\%$ cost reduction relative to conventional TD-DFT while preserving reliable relative rankings, evidenced by a Pearson $r \approx 0.82$ for $\Delta E_{\text{ST}}$ and MAE ≈ 0.17 eV against experiment. The study extracts robust design principles, confirming D-A-D architectures and an optimal D-A torsional window of $50^{\circ}$–$90^{\circ}$, and reveals a low-dimensional design space where the first three principal components capture about $90\%$ of variance. These findings establish a validated, data-driven HTS framework that accelerates TADF emitter discovery and provides practical guidelines for computational materials science in OLED design.

Abstract

Thermally activated delayed fluorescence (TADF) emitters are essential for next-generation, high-efficiency organic light-emitting diodes (OLEDs), yet their rational design is hampered by the high computational cost of accurate excited-state predictions. Here, we present a comprehensive benchmark study validating semi-empirical extended tight-binding (xTB) methods -- specifically sTDA-xTB and sTD-DFT-xTB -- for the high-throughput screening of TADF materials. Using an unprecedentedly large dataset of \num{747} experimentally characterized emitters, our framework demonstrates a computational cost reduction of over \qty{99}{\percent} compared to conventional TD-DFT, while maintaining strong internal consistency between methods (Pearson $r \approx \num{0.82}$ for \deltaest), validating their utility for relative molecular ranking. Validation against \num{312} experimental \deltaest values reveals a mean absolute error of approximately \qty{0.17}{\electronvolt}, a discrepancy attributed to the vertical approximation inherent to the HTS protocol, underscoring the methods' role in screening rather than quantitative prediction. Through large-scale data analysis, we statistically validate key design principles, confirming the superior performance of Donor-Acceptor-Donor (D-A-D) architectures and identifying an optimal D-A torsional angle range of \qtyrange{50}{90}{\degree} for efficient TADF. Principal Component Analysis reveals that the complex property space is fundamentally low-dimensional, with three components capturing nearly \qty{90}{\percent} of the variance. This work establishes these semi-empirical methods as powerful, cost-effective tools for accelerating TADF discovery and provides a robust set of data-driven design rules and methodological guidelines for the computational materials science community.

Validation of Semi-Empirical xTB Methods for High-Throughput Screening of TADF Emitters: A 747-Molecule Benchmark Study

TL;DR

This work addresses the need for scalable screening of TADF emitters by validating semi-empirical xTB methods (sTDA-xTB and sTD-DFT-xTB) on a 747-molecule benchmark. By coupling GFN2-xTB ground-state geometries with rapid excited-state calculations and including implicit solvent effects, the authors achieve > cost reduction relative to conventional TD-DFT while preserving reliable relative rankings, evidenced by a Pearson for and MAE ≈ 0.17 eV against experiment. The study extracts robust design principles, confirming D-A-D architectures and an optimal D-A torsional window of , and reveals a low-dimensional design space where the first three principal components capture about of variance. These findings establish a validated, data-driven HTS framework that accelerates TADF emitter discovery and provides practical guidelines for computational materials science in OLED design.

Abstract

Thermally activated delayed fluorescence (TADF) emitters are essential for next-generation, high-efficiency organic light-emitting diodes (OLEDs), yet their rational design is hampered by the high computational cost of accurate excited-state predictions. Here, we present a comprehensive benchmark study validating semi-empirical extended tight-binding (xTB) methods -- specifically sTDA-xTB and sTD-DFT-xTB -- for the high-throughput screening of TADF materials. Using an unprecedentedly large dataset of \num{747} experimentally characterized emitters, our framework demonstrates a computational cost reduction of over \qty{99}{\percent} compared to conventional TD-DFT, while maintaining strong internal consistency between methods (Pearson for \deltaest), validating their utility for relative molecular ranking. Validation against \num{312} experimental \deltaest values reveals a mean absolute error of approximately \qty{0.17}{\electronvolt}, a discrepancy attributed to the vertical approximation inherent to the HTS protocol, underscoring the methods' role in screening rather than quantitative prediction. Through large-scale data analysis, we statistically validate key design principles, confirming the superior performance of Donor-Acceptor-Donor (D-A-D) architectures and identifying an optimal D-A torsional angle range of \qtyrange{50}{90}{\degree} for efficient TADF. Principal Component Analysis reveals that the complex property space is fundamentally low-dimensional, with three components capturing nearly \qty{90}{\percent} of the variance. This work establishes these semi-empirical methods as powerful, cost-effective tools for accelerating TADF discovery and provides a robust set of data-driven design rules and methodological guidelines for the computational materials science community.

Paper Structure

This paper contains 23 sections, 1 equation, 16 figures, 11 tables.

Figures (16)

  • Figure 1: Overview of the simulation workflow. Starting with a SMILES string, the code performs conformer search and geometry optimisation via xTB for the singlet ground state $S_0$ and the triplet state $T_1$. It allows the extraction of the relaxed triplet excitation energy. Simplified time-dependent DFT calculation with sTDA/sTD-DFT extracts the vertical singlet-triplet gap, the relaxed triplet-triplet gap, the oscillator strength, the vertical excitation energy and the fluorescence absorption and emission spectra, while incorporating solvent effects to enhance the realism of our simulations. The Stokes shift is evaluated and then allows the relaxed singlet-triplet gap to be estimated.
  • Figure 3: Correlation between predicted and reference emission wavelengths ($\lambda_{\text{PL}}$) for 213.0 TADF molecules. Predictions from sTDA and sTD-DFT methods in gas and toluene phases are compared against experimental literature values. Black dashed lines represent perfect agreement (identity line); red solid lines show linear regression fits. The strong, positive correlations ($r > 0.56$) demonstrate the methods' capability to reliably predict relative emission wavelength trends, which is pivotal for virtual screening.
  • Figure 4: Error distributions for emission wavelength predictions ($\lambda_{\text{predicted}} - \lambda_{\text{reference}}$). The histograms show approximately Gaussian error patterns centered near zero for all four method/phase combinations. This indicates an absence of strong systematic bias, with the standard deviations of 118149 reflecting the typical prediction uncertainties of the high-throughput protocol.
  • Figure 5: Correlation between predicted and reference singlet--triplet energy gaps ($\Delta E_{\text{ST}}$) for 296.0 TADF molecules. Despite considerable scatter, the statistically significant positive correlations ($p < 0.02$) confirm that the semi-empirical methods can correctly capture qualitative trends in $\Delta E_{\text{ST}}$ across the dataset. The regression slopes are significantly less than 1, indicating a systematic underestimation of larger gaps.
  • Figure 6: Error distributions for singlet--triplet gap predictions. The residuals are approximately centered at zero, but their large standard deviations (0.360.38) reflect the inherent difficulty in quantitatively predicting small energy differences between nearly degenerate electronic states with a high-throughput method.
  • ...and 11 more figures