Table of Contents
Fetching ...

AQVolt26: High-Temperature r$^2$SCAN Halide Dataset for Universal ML Potentials and Solid-State Batteries

Jiyoon Kim, Chuhong Wang, Aayush R. Singh, Tyler Sours, Shivang Agarwal, AJ Nish, Paul Abruzzo, Ang Xiao, Omar Allam

Abstract

The demand for safe, high-energy-density batteries has spotlighted halide solid-state electrolytes, which offer the potential for enhanced ionic mobility, electrochemical stability, and interfacial deformability. Accelerating their discovery requires extensive molecular dynamics, which has been increasingly enabled by universal machine learning interatomic potentials trained on foundational datasets. However, the dynamic softness of halides poses a stringent test of whether general-purpose models can reliably replace first-principles calculations under the highly distorted, elevated-temperature regimes necessary to probe ion transport. Here, we present AQVolt26, a dataset of 322,656 r$^2$SCAN single-point calculations for lithium halides, generated via high-temperature configurational sampling across $\sim$5K structures. We demonstrate that foundational datasets provide a strong baseline for stable halide chemistries and transfer local forces well, however absolute energy predictions degrade in distorted higher-temperature regimes. Co-training with AQVolt26 resolves this blind spot. Furthermore, incorporating Materials Project relaxation data improves near-equilibrium performance but degrades extreme-strain robustness without enhancing high-temperature force accuracy. These results demonstrate that domain-specific configurational sampling is essential for the reliable dynamic screening of halide electrolytes. Furthermore, our findings suggest that while foundational models provide a robust base, they are most effective for dynamically soft solid-state chemistries when augmented with targeted, high-temperature data. Finally, we show that near-equilibrium relaxation data serves as a task-specific complement rather than a universally beneficial addition.

AQVolt26: High-Temperature r$^2$SCAN Halide Dataset for Universal ML Potentials and Solid-State Batteries

Abstract

The demand for safe, high-energy-density batteries has spotlighted halide solid-state electrolytes, which offer the potential for enhanced ionic mobility, electrochemical stability, and interfacial deformability. Accelerating their discovery requires extensive molecular dynamics, which has been increasingly enabled by universal machine learning interatomic potentials trained on foundational datasets. However, the dynamic softness of halides poses a stringent test of whether general-purpose models can reliably replace first-principles calculations under the highly distorted, elevated-temperature regimes necessary to probe ion transport. Here, we present AQVolt26, a dataset of 322,656 rSCAN single-point calculations for lithium halides, generated via high-temperature configurational sampling across 5K structures. We demonstrate that foundational datasets provide a strong baseline for stable halide chemistries and transfer local forces well, however absolute energy predictions degrade in distorted higher-temperature regimes. Co-training with AQVolt26 resolves this blind spot. Furthermore, incorporating Materials Project relaxation data improves near-equilibrium performance but degrades extreme-strain robustness without enhancing high-temperature force accuracy. These results demonstrate that domain-specific configurational sampling is essential for the reliable dynamic screening of halide electrolytes. Furthermore, our findings suggest that while foundational models provide a robust base, they are most effective for dynamically soft solid-state chemistries when augmented with targeted, high-temperature data. Finally, we show that near-equilibrium relaxation data serves as a task-specific complement rather than a universally beneficial addition.

Paper Structure

This paper contains 14 sections, 12 figures, 5 tables.

Figures (12)

  • Figure 1: A summary of the AQVolt26 dataset and models. A configurational landscape of 200 million Li halide structures was selectively sampled and labeled with 322,656 r$^2$SCAN single-point calculations, generating the largest off-equilibrium dataset for solid-state electrolyte materials. Universal models were co-trained with foundational meta-GGA datasets using state-of-the-art architectures, demonstrating superior performance for materials with applied strain, stability during molecular dynamics simulations, and more accurate ionic conductivity values relative to experimentally-validated results.
  • Figure 2: Overview of the AQVolt26 data generation and training approach (top). A dataset of 322,656 r$^2$SCAN single-point calculations was created through configurational generation with surrogate-driven phase space exploration (bottom left, in red) and dimensionality reduction with the 2DIRECT method Qi2024RobustSampling (bottom left, in teal), covering a larger feature space compared to halides in the Materials Project (bottom right).
  • Figure 3: Comparison of the distributions of cohesive energies, interatomic force magnitudes, and pressures across four r$^2$SCAN datasets: AQVolt26 (blue), MatPES (green), MP-ALOE (grey), and Materials Project (pink) with 322,656, 2,581, 5,640, and 2,166 Li halide structures, respectively. AQVolt26 and MatPES configurations are primarily derived from single-point calculations at temperatures $\geq$ 300 K, while MP-ALOE and the Materials Project consist of structure optimizations at 0 K.
  • Figure 4: Benchmarking of energy, force, and stress predictions across trained models, evaluated on four r$^2$SCAN datasets and binned by maximum DFT force magnitude. The Materials Project Huang2025Cross-functionalPotentials, MatPES Kaplan2025AMaterials, and MP-ALOE Kuner2025MP-ALOE:Potentials sets explicitly exclude Li-halide systems. The "Li Halides" set aggregates all halide configurations from these sources, AQVolt26, and an r$^2$SCAN-recomputed subset of GNoME Merchant2023ScalingDiscovery, sampled from MLMD trajectories. Test splits consist of in-distribution configurations strictly unseen by the eSEN models; however, publicly available baseline checkpoints may have previously encountered this data as exact training holdouts are unavailable. Further granular analysis in the Supplementary Information reveals a minor trade-off: adapting to the highly perturbed AQVolt26 configurational space slightly degrades prediction precision on the small, highly stable GNoME subset.
  • Figure 5: Distributions of structural similarity (top) and formation energy per atom (bottom) when comparing structures relaxed with ML interatomic potentials against r$^2$SCAN DFT relaxations. 1,000 out-of-domain structures were randomly selected from the GNoME Merchant2023ScalingDiscovery database. Before performing geometry optimizations, every atomic site was subjected to a random perturbation of 0.1 Å, and fingerprint distances were calculated using the CrystalNN algorithm Zimmermann2020LocalSimilarity.
  • ...and 7 more figures