Back to artifacts

AI for Science: January 2026 Week 4

Jan 19 – Jan 25, 2026 · 45 papers analyzed · 3 breakthroughs

Summary

Analyzed 45 unique papers from Jan 19-25, 2026 across AI4Math, AI4Physics, and Scientific ML. 3 breakthroughs: (1) 2601.15737 PhysProver pioneers formal theorem proving for physics with GRPO, achieving 36.4% on physics proofs with positive transfer to MiniF2F; (2) 2601.14027 Numina-Lean-Agent delivers an open agentic reasoning system achieving 12/12 on Putnam 2025; (3) 2601.17137 introduces on-the-fly MLFFs enabling first-principles polymer Tg prediction across 12 diverse polymers. Key trends: formal physics verification emerging as frontier, agentic systems dominating formal math, ML potentials achieving ab initio accuracy for thermophysical properties.

Key Takeaway

Week 4 of 2026 marks a pivotal moment for AI4Math and AI4Physics convergence: PhysProver bridges formal verification to physics, Numina-Lean-Agent democratizes Putnam-level theorem proving, while ML potentials cross the threshold from energy prediction to thermophysical property calculation. The common thread is domain-specific data combined with modular, verifiable architectures.

Breakthroughs (3)

1. PhysProver: Advancing Automatic Theorem Proving for Physics

Why Novel: First systematic effort to extend formal theorem proving to physics domains, demonstrating that physics-focused training data and Reinforcement Learning with Verifiable Rewards (GRPO) can transfer to and improve mathematical reasoning capabilities.

Key Innovations:

  • Constructs PhysLeanData: 5,541 physics statements from PhysLean and Claude-generated synthetic lemmas covering Classical Mechanics, Particle Physics, Relativity, and QFT
  • Applies GRPO (Group Relative Policy Optimization) with Lean proof correctness as reward signal, outperforming SFT which causes accuracy drops
  • Achieves 36.4% pass@16 on physics proofs (+2.4% over DeepSeek-Prover-V2), with gains across all physics subdomains
  • Demonstrates positive transfer: training on physics data improves MiniF2F-Test from 68.4% to 69.7%, especially on algebra (+2.9%) and number theory (+3.3%)

Evidence:

  • — Main results showing PhysProver achieves 36.4% overall, outperforming GPT-5 (26.4%), Claude-4.5-Sonnet (34.4%), and specialized provers
  • — Out-of-distribution generalization on MiniF2F showing 69.7% with PhysLeanData vs 68.4% baseline
  • — Ablation showing SFT degrades performance (-6.4%) while RAFT improves (+1.6%)
  • — Framework overview: data generation with Claude-4.5 + Lean filtering, then GRPO self-evolving stage
  • — Example proofs showing PhysProver's superior in-context lemma usage for QFT time contraction

Impact: Opens formal verification to physics, establishing a new benchmark and training paradigm that combines domain-specific data with RL. The positive transfer to math suggests physics reasoning may enhance general formal reasoning capabilities.

2. Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics

Why Novel: First fully open agentic system to achieve perfect 12/12 on Putnam 2025 competition, matching proprietary systems (Axiom) while providing complete transparency through Claude Code + MCP architecture.

Key Innovations:

  • Introduces Numina-Lean-MCP: modular tooling including LeanDex (Lean API), Informal Prover (natural language reasoning), and Discussion Partner (subagent for complex problems)
  • Achieves 12/12 on Putnam 2025, matching Axiom and exceeding Aristotle (10/12) and Seed-Prover 1.5 (11/12)
  • Demonstrates component importance: without informal prover drops to 5/12, without subagent drops to 11/12
  • Open architecture enables reproducibility and community extension, unlike proprietary competitors

Evidence:

  • — Performance comparison showing Numina-Lean-Agent achieves 12/12 alongside Axiom, exceeding other methods
  • — Ablation study: w/o informal prover = 5/12, w informal = 11/12, w subagent = 12/12
  • — Time comparison: competitive with other methods, solving A5 in 2040 min (hardest problem)
  • — Architecture overview showing Claude Code orchestrating Lean-MCP tools for autonomous formal reasoning

Impact: Democratizes state-of-the-art formal math proving by providing a fully open, extensible agentic framework. The modular MCP design enables rapid iteration and community contributions to formal reasoning capabilities.

3. On-the-Fly Machine-Learned Force Fields for High-Fidelity Polymer Glass Transition Simulations

Why Novel: First method to predict polymer glass transition temperatures with first-principles (DFT) fidelity for large, disordered systems by combining adaptive on-the-fly MLFF learning with Bayesian uncertainty-driven sampling.

Key Innovations:

  • Develops adaptive OTF-MLFF framework building robust force fields from ~1000 AIMD configurations, using Bayesian uncertainty to trigger additional DFT calculations only when needed
  • Achieves Tg predictions in excellent agreement with experiments across 12 diverse polymers including PE, PS, PMMA, and novel systems without classical FF parameters
  • Demonstrates scalability from 206 to 5562 atoms while maintaining DFT accuracy, with linear scaling beyond 500 atoms
  • Provides density and thermal expansion coefficient predictions matching experimental values, validating physical consistency

Evidence:

  • — OTF-MLFF framework: AIMD initialization, Bayesian error estimation, automated dataset expansion, real-time FF refinement
  • — Volume-temperature curves for 12 polymers with calculated Tg values overlaid with experimental ranges
  • — Comparison with experiment and classical FFs (PCFF, GAFF2) showing OTF-MLFF matches experimental Tg and density
  • — Scaling analysis showing linear scaling beyond 500 atoms, with OTF-MLFF ~1000x faster than AIMD

Impact: Enables predictive polymer thermophysical property calculations at DFT accuracy without relying on classical force field parameterization. Particularly valuable for novel polymers lacking experimental FF parameters.

Trends

  • Formal physics verification emerging: PhysProver pioneers theorem proving for physics with domain-specific training data and RL, demonstrating that physics reasoning can transfer to and improve mathematical proof capabilities.

  • Agentic systems dominating formal math: Numina-Lean-Agent's 12/12 Putnam 2025 score via modular MCP architecture signals shift toward agent-orchestrated formal reasoning over monolithic model approaches.

  • ML potentials achieving thermophysical prediction: On-the-fly MLFFs enable first-principles glass transition temperature prediction for polymers, expanding ML potential applications beyond static energy/force prediction.

  • Neural operators advancing PDE solutions: SFO's universal spectral basis and LANO's partial observation handling push neural operators toward practical scientific computing with incomplete data.

  • Materials discovery integrating experiment: AI-enhanced phosphosulfide work demonstrates closed-loop HT-DFT, ML prediction, and combinatorial synthesis, achieving 4 experimentally validated new semiconductors.

Notable Papers (6)

1. Learning to Discover at Test Time

TTT-Discover achieves new SOTA on Erdos Minimum Overlap (0.380876 vs prior 0.380924), TriMul kernel engineering (2x faster than best human), and AtCoder competitions via adaptive entropic objective with PUCT state reuse during test-time training.

2. Equivariant Interatomic Potentials without Tensor Products

Geodite-MP removes Clebsch-Gordan tensor products from equivariant potentials using inner-product interactions with physically motivated priors, achieving competitive Matbench Discovery accuracy while running 3-5x faster than tensor-product baselines.

3. SFO: Learning PDE Operators via Spectral Filtering

Introduces Universal Spectral Basis from Hilbert matrix eigenvectors for neural operators, achieving best-in-class results on 5 of 6 PDE benchmarks including Allen-Cahn (0.05% L2 error) and Shallow Water (0.38% L2 error) with compact coefficient learning.

4. Structured Hints for Sample-Efficient Lean Theorem Proving

Demonstrates that lightweight Lean-aware IR with fixed tactic skeletons improves theorem proving from 16.4% to 21.7% pass rate on MiniF2F under constrained compute, without requiring expensive RL training.

5. AI-enhanced discovery and accelerated synthesis of metal phosphosulfides

Combines HT-DFT screening of 909 ternary phosphosulfides with multi-fidelity ML band gap prediction (0.14 eV MAE) and DADMARS thin-film synthesis, discovering 19 stable compounds and experimentally synthesizing 4 (Cu3PS4, Cu7PS6, Ag3PS4, Ag7PS6).

6. Learning Neural Operators from Partial Observations via Latent Autoregressive Modeling

LANO addresses incomplete observational data in neural operators via Mask-to-Predict supervision and Physics-Aware Latent Propagator, achieving state-of-the-art on POBench-PDE with partial observations from boundary information.

Honorable Mentions

  • Hint-Based SMT Proof Reconstruction ()
  • Learning PDE Solvers with Physics and Data: A Unifying View of PINNs and Neural Operators ()
  • Efficient Dilated Squeeze and Excitation Neural Operator for Differential Equations ()
  • Enhanced Representation-Based Sampling for ML Interatomic Potentials ()
  • GPUTB-2: E(3) network for learning orthogonal Hamiltonian ()
  • Anharmonic thermodynamics redefines metastability in ferroelectric HfO2 ()
  • DDCCNet: Physics-enhanced Neural Networks for Coupled-cluster ()
  • Verified polynomial-time reductions in Lean 4 ()