Representation Meets Optimization: Training PINNs and PIKANs for Gray-Box Discovery in Systems Pharmacology
Nazanin Ahmadi Daryakenari, Khemraj Shukla, George Em Karniadakis
TL;DR
This work systematically benchmarks PINNs and tanh-cPIKANs for gray-box discovery in pharmacology, introducing a Chebyshev-based tanh-cPIKAN variant that stabilizes training. By evaluating a spectrum of optimizers, learning-rate schedulers, and numerical precisions on PK and PD inverse problems, it shows that no single method is universally best; however, hybrid optimization (RAdam warm-up followed by BFGS) and double-precision training consistently yield strong accuracy, especially for tanh-cPIKANs. The results offer practical guidance on architecture choice, optimization strategies, and precision settings to robustly recover missing dynamics in sparse, ill-posed biomedical data. The study also demonstrates the value of detailed loss-landscape analyses and provides public code to reproduce the findings and apply them to related gray-box modeling tasks.
Abstract
Physics-Informed Kolmogorov-Arnold Networks (PIKANs) are gaining attention as an effective counterpart to the original multilayer perceptron-based Physics-Informed Neural Networks (PINNs). Both representation models can address inverse problems and facilitate gray-box system identification. However, a comprehensive understanding of their performance in terms of accuracy and speed remains underexplored. In particular, we introduce a modified PIKAN architecture, tanh-cPIKAN, which is based on Chebyshev polynomials for parametrization of the univariate functions with an extra nonlinearity for enhanced performance. We then present a systematic investigation of how choices of the optimizer, representation, and training configuration influence the performance of PINNs and PIKANs in the context of systems pharmacology modeling. We benchmark a wide range of first-order, second-order, and hybrid optimizers, including various learning rate schedulers. We use the new Optax library to identify the most effective combinations for learning gray-boxes under ill-posed, non-unique, and data-sparse conditions. We examine the influence of model architecture (MLP vs. KAN), numerical precision (single vs. double), the need for warm-up phases for second-order methods, and sensitivity to the initial learning rate. We also assess the optimizer scalability for larger models and analyze the trade-offs introduced by JAX in terms of computational efficiency and numerical accuracy. Using two representative systems pharmacology case studies - a pharmacokinetics model and a chemotherapy drug-response model - we offer practical guidance on selecting optimizers and representation models/architectures for robust and efficient gray-box discovery. Our findings provide actionable insights for improving the training of physics-informed networks in biomedical applications and beyond.
