A machine learning framework integrating seed traits and plasma parameters for predicting germination uplift in crops
Saklain Niam, Tashfiqur Rahman, Md. Amjad Patwary, Mukarram Hossain
TL;DR
The paper tackles the challenge of predicting germination uplift from cold plasma priming by integrating seed vigor traits with dielectric barrier discharge (DBD) plasma parameters across multiple crops. It introduces a cross-crop dataset and benchmarks baseline and hybrid models, with Extra Trees delivering the best predictive performance (test $R^2 \approx 0.92$, $RMSE \approx 3.2$; $MAE \approx 2.6$), further improved when reducing to a deployment-ready feature subset ($R^2 = 0.925$). The study reveals a hormetic response, where uplift is optimized within $7$--$15$ kV and $200$--$500$ s, and identifies discharge power ($\geq 100$ W) as a dominant lever, while time plays a smaller role. LOCO validation and external datasets show limited cross-genotype generalization, underscoring the need for cultivar-aware inputs and larger, multi-environment CP datasets for scalable, field-ready deployment. The authors implement an MLflow-based pipeline to enable reproducible, deployment-ready optimization for precision agriculture and sustainable seed technologies.
Abstract
Cold plasma (CP) is an eco-friendly method to enhance seed germination, yet outcomes remain difficult to predict due to complex seed--plasma--environment interactions. This study introduces the first machine learning framework to forecast germination uplift in soybean, barley, sunflower, radish, and tomato under dielectric barrier discharge (DBD) plasma. Among the models tested (GB, XGB, ET, and hybrids), Extra Trees (ET) performed best (R\textsuperscript{2} = 0.919; RMSE = 3.21; MAE = 2.62), improving to R\textsuperscript{2} = 0.925 after feature reduction. Engineering analysis revealed a hormetic response: negligible effects at $<$7 kV or $<$200 s, maximum germination at 7--15 kV for 200--500 s, and reduced germination beyond 20 kV or prolonged exposures. Discharge power was also a dominant factor, with germination rate maximizing at $\geq$100 W with low exposure time. Species and cultivar-level predictions showed radish (MAE = 1.46) and soybean (MAE = 2.05) were modeled with high consistency, while sunflower remained slightly higher variable (MAE = 3.80). Among cultivars, Williams (MAE = 1.23) and Sari (1.33) were well predicted, while Arian (2.86) and Nyírségi fekete (3.74) were comparatively poorly captured. This framework was also embedded into MLflow, providing a decision-support tool for optimizing CP seed germination in precision agriculture.
