Sovereign WDM Simulation on Consumer GPU
Plasma Physics x GPU Compute — warm dense matter on consumer GPU, guideStone v0.7.0 certified. hotSpring. 59/59 checks.
Date: March 14, 2026 (updated) Status: Validated + Live Kokkos Parity + Precision Stability + Precision Brain + VFIO PBDMA Context Load — hotSpring v0.6.31, 848 lib tests, 115 binaries, 85 WGSL shaders. All plasma MD, lattice QCD, and nuclear HFB reproduction complete. GPU promotion: Papers 43 (gradient flow, 38.5× speedup) and 44 (BGK dielectric, 12/12 physics checks). Full multi-tier precision stability analysis (Exp 046): 9 cancellation families audited across f32/DF64/f64/CKKS FHE. Stable BCS v² and plasma W(z) algorithms enable DF64 throughput (16× on consumer GPUs) without precision loss. Precision brain (Exp 049): self-routing hardware calibration, NVVM device poisoning discovered and gated, dual-GPU cooperative patterns (Split BCS 2.2×, Split HMC, Redundant, PCIe 1.2 GB/s). Live Kokkos benchmark (Exp 053): 9/9 Yukawa cases, 12.4× gap ( barraCuda 212 steps/s vs Kokkos-CUDA 2,630 steps/s at N=2000) — gap dominated by native f64 fallback (1:32 on Ampere), DF64 safe-path fix expected to close to ~2×. DF64 transcendental poisoning bug discovered and fixed. VFIO PBDMA context load (Exp 058): 3 critical Volta register discoveries (preempt 0x002638, ACK 0x002A00, SIGNATURE validation), PBDMA2 loads RAMFC with zero errors; USERD DMA read remaining. coralReef P10 Iter 52+. Dual Titan V mmiotrace planned. Zero clippy warnings (lib+bins), zero unsafe, all AGPL-3.0-only. 60/60 Sarkas observable checks (N=10k, 80k steps, $0.044). Deconfinement phase transition at β_c=5.69 on RTX 3090 (32⁴, 13.6h, $0.58). DF64 core streaming delivers 9.9× native f64 throughput. Verlet neighbor list achieves 992 steps/s (κ=3). Transport coefficients D*, η*, λ* via GPU Green-Kubo. Domain: Plasma physics, computational science, distributed computing Novelty: No prior work demonstrates full WDM transport coefficient reproduction on consumer GPU via vendor-agnostic shaders; no prior work frames distributed consumer GPU networks as alternatives to institutional HPC for plasma physics Cross-Spring: hotSpring (MD + transport + lattice QCD) × neuralSpring (surrogate learning + LSTM) × groundSpring V113 (uncertainty propagation — GemmF64 transpose (Tikhonov KᵀK/KᵀG), RetryPolicy + CircuitBreaker, 4-format capability parsing, exit_code constants, 102 barracuda delegations)
Abstract
We demonstrate that warm dense matter (WDM) — the plasma regime central to inertial confinement fusion (ICF), planetary interiors, and stellar evolution — can be simulated on consumer GPU hardware using the BarraCuda compute stack. The 70-author “Roadmap for warm dense matter physics” (Murillo et al., arXiv:2505.02494, revised Feb 13, 2026) identifies computational accessibility as a critical bottleneck: state-of-the-art codes require institutional HPC allocations, creating artificial scarcity in who can do WDM science.
We propose that hotSpring’s validated pipeline — Yukawa MD (9/9 GPU), Green-Kubo transport (13/13), nuclear EOS (195/195), screened Coulomb (23/23), and lattice QCD (full GPU pipeline) — already contains the primitives needed for WDM simulation at modest system sizes. The gap is not capability but scale, and scale is a distribution problem, not an algorithm problem.
We frame this as the GPU-as-hot-water-heater thesis: every consumer GPU running WGSL shaders through open Vulkan drivers can contribute to WDM computation while its waste heat serves domestic purposes. A network of 1,000 idle consumer GPUs, each contributing 1 GPU-hour/day, provides 365,000 GPU-hours/year — equivalent to a mid-tier institutional HPC allocation — at zero marginal compute cost and with waste heat as a useful byproduct.
1. The Problem: WDM Computation Is Artificially Scarce
1.1 What Is Warm Dense Matter?
WDM occupies the regime between cold condensed matter and hot classical plasma: temperatures of 10⁴–10⁸ K, densities of 0.1–100 g/cm³. At these conditions, neither cold-matter approximations (band theory, perturbation theory) nor hot-plasma approximations (Debye-Hückel, ideal gas) apply. The electrons are partially degenerate, ions are strongly coupled, and quantum effects coexist with classical dynamics.
WDM matters because it describes:
- ICF fuel: the deuterium-tritium capsule during NIF implosion
- Planetary cores: Jupiter, Saturn, super-Earths
- Stellar interiors: white dwarf envelopes, brown dwarfs
- Astrophysical shocks: supernovae, neutron star mergers
1.2 The Computational Bottleneck
The WDM roadmap (arXiv:2505.02494) identifies several open computational challenges:
- Transport coefficients at WDM conditions (partially ionized, strongly coupled) — extending Stanton-Murillo (2016) beyond the classical regime
- Equation of state for mixtures under compression — beyond SEMF/HFB
- Dynamic structure factor S(q,ω) — the key experimental diagnostic for X-ray Thomson scattering (XRTS) at NIF
- Orbital-free DFT for large-scale WDM simulations
- Wavepacket MD for quantum ion dynamics
Current state: these calculations require institutional HPC (Frontier, Summit, Perlmutter). A typical WDM transport calculation uses 4–8 GPUs for days. Allocation proposals take months. Results are published behind paywalls.
1.3 The Sovereign Alternative
hotSpring has already reproduced Stanton-Murillo transport on a single RTX 4070 at ~$0.02 compute cost. The Yukawa MD pipeline runs at 34.7× CPU speed on GPU. The lattice QCD pipeline achieves 40× CPU speed with streaming GPU HMC. The nuclear EOS pipeline covers 195 nuclei.
The question is not whether consumer GPU can do WDM physics — hotSpring has already proven it can for classical plasma. The question is whether the extensions to WDM conditions (partial ionization, quantum effects, higher temperatures) are tractable on the same hardware.
2. What hotSpring Already Has
2.1 Validated Primitives
| Primitive | hotSpring Paper | Checks | WDM Extension |
|---|---|---|---|
| Yukawa MD (all-pairs + cell-list) | Paper 1 | 9/9 GPU | Extend to screened potentials with partial ionization |
| Green-Kubo transport (D*, η*, λ*) | Paper 5 | 13/13 | Extend to WDM conditions (higher T, Z*) |
| Nuclear EOS (SEMF→HFB) | Paper 4 | 195/195 | Use as cold-curve input for WDM EOS |
| Screened Coulomb eigensolve | Paper 6 | 23/23 | Yukawa screening at WDM parameters |
| FFT (1D + 3D, f64) | ToadStool | 14/14 GPU | Required for S(q,ω) computation |
| Lattice QCD HMC | Papers 8-12 | Full GPU | Monte Carlo sampling methodology |
| Streaming GPU dispatch | Paper 10+ | 9/9 | Zero CPU→GPU transfer architecture |
2.2 The FFT Gap Is Closed
ToadStool commit 1ffe8b1a delivered Fft1DF64 and Fft3DF64 with roundtrip validation to 1e-10 on RTX 3090. This was THE major blocker for S(q,ω) computation. With FFT available, the dynamic structure factor becomes:
S(q,ω) = (1/N) |∫ Σ_j exp(iq·r_j(t)) exp(-iωt) dt|²which is a spatial Fourier transform (over particle positions) followed by a temporal Fourier transform (over MD trajectory). Both transforms now run on GPU.
2.3 What’s Missing
| Gap | Effort | Priority |
|---|---|---|
| Partial ionization model (Z* from Thomas-Fermi or average-atom) | Medium | P1 |
| Electron-ion coupling (two-temperature model extension) | Medium | P1 |
| Wavepacket evolution (quantum ion dynamics) | High | P3 |
| Orbital-free kinetic energy functional | High | P3 |
| Multi-component mixture EOS | Medium | P2 |
3. The GPU-as-Hot-Water-Heater Thesis
3.1 The Core Argument
A consumer GPU running WDM simulations at 200W produces:
- Useful computation: ~10 TFLOPS f64 sustained
- Waste heat: 200W thermal output
In a residential setting, this waste heat can offset space heating or water heating costs. A GPU mining cryptocurrency wastes energy on proof of meaningless work. A GPU running WDM simulations wastes energy on proof of meaningful work — the same Joules produce both heat and science.
3.2 The Economics
| Resource | Institutional HPC | Consumer GPU Network |
|---|---|---|
| Hardware | $600M (Frontier) | $600/node × N nodes |
| Allocation | Competitive proposal (months) | Volunteer (immediate) |
| Energy | Grid power at industrial rate | Residential, offset by heating |
| Access | Credential-gated | Open (AGPL-3.0) |
| Results | Journal paywall | Public domain |
| Vendor lock | CUDA (NVIDIA-only) | WGSL/Vulkan (any GPU) |
3.3 Why WGSL/Vulkan Matters
BarraCuda’s WGSL shaders run on any GPU exposing Vulkan with SHADER_F64. This includes:
- NVIDIA (RTX 2070 through 5090, Titan V)
- AMD (RX 6000+, MI-series)
- Intel (Arc A-series)
- Qualcomm (Adreno, via Android Vulkan)
- Apple (via MoltenVK translation layer)
CUDA locks WDM computation to NVIDIA. WGSL liberates it. The same physics shader runs identically on a $300 used RTX 2070 and a $2000 RTX 5090 — only the speed changes, not the math.
3.4 Distributed Architecture
The NUCLEUS mesh ( biomeOS + BearDog + Songbird) provides:
- Task distribution: BOINC-style work units, but with covalent trust
- Verification: deterministic MD trajectories verify via hash
- Aggregation: independent parameter sweeps combine trivially
- Fault tolerance: any node can fail; work units redistribute
WDM transport calculations are embarrassingly parallel across the (κ, Γ, T) parameter space. Each point is an independent MD run. A network of 100 GPUs can sweep 100 parameter points simultaneously.
4. Experimental Design
4.1 Phase 1: Reproduce FPEOS on Consumer GPU
Target: Militzer’s First-Principles Equation of State database (Berkeley, open C++/Python code). Reproduce EOS tables for hydrogen, helium, and carbon at WDM conditions.
Method: Port average-atom + MD pipeline to BarraCuda. Validate against published FPEOS tables. Measure accuracy vs compute cost.
Success criterion: Agreement with FPEOS tables to within published uncertainties, running on a single RTX 4070.
4.2 Phase 2: WDM Transport Coefficients
Target: Extend Stanton-Murillo (Paper 5) to WDM conditions.
Method: Run Yukawa MD at elevated temperatures (T > 10 eV) with density-dependent screening. Compute Green-Kubo transport. Compare to published DFT-MD values from the roadmap comparison studies.
Success criterion: D*, η*, λ* within 50% of DFT-MD for at least 3 (ρ, T) points in the WDM regime.
4.3 Phase 3: Dynamic Structure Factor
Target: Compute S(q,ω) from MD trajectories for comparison with NIF XRTS experimental data.
Method: Spatial FFT of particle positions → intermediate scattering function F(q,t) → temporal FFT → S(q,ω). All on GPU via validated FFT primitives.
Success criterion: S(q,ω) peak positions and widths match published MD results for hydrogen at WDM conditions.
4.4 Phase 4: Distributed Parameter Sweep
Target: Full (ρ, T) sweep of transport coefficients using the NUCLEUS mesh (2 GPUs initially, scaling to N).
Method: biomeOS distributes MD work units across available GPUs. Each GPU runs an independent (κ, Γ) point. Results aggregate into a transport table.
Success criterion: Linear scaling of throughput with GPU count. Deterministic verification of all results via trajectory hash.
5. Connection to NIF and the Ignition Era
5.1 NIF Context
The National Ignition Facility achieved fusion energy gain in December 2022, with 6 subsequent successful shots reaching peak gain of 2.3× at 5.2 MJ (as of the Feb 2026 NIF/JLF User Groups Meeting). This has created unprecedented demand for WDM simulation to design future targets, understand capsule physics, and optimize implosion conditions.
5.2 Murillo’s Role
Michael Murillo (MSU, Computational Mathematics, Science, & Engineering) co-authored the WDM roadmap (arXiv:2505.02494) and has published extensively on transport coefficients in dense plasma. His Stanton-Murillo (2016) transport paper is hotSpring Paper 5 — the first complete reproduction in our pipeline. His screened Coulomb work is Paper 6.
hotSpring was built to validate BarraCuda against Murillo’s published results. Extending to WDM conditions is the natural next step — using the same infrastructure, the same validation methodology, the same consumer hardware.
5.3 What This Proves
If a single graduate student with a $600 GPU can reproduce WDM transport coefficients that currently require institutional HPC allocations, it demonstrates that:
- The computation is not the bottleneck — the algorithm is
- Access to physics is artificially scarce — not technically scarce
- Open-source GPU stacks can do real science — not just benchmarks
- Distributed consumer GPU networks are viable — not just theoretical
6. Connection to Other baseCamp Sub-Theses
| Sub-Thesis | Connection |
|---|---|
| 01 (Anderson QS) | Anderson localization is spectral theory; WDM uses same eigensolve primitives |
| 02 (Frozen Fossil) | Constrained evolution under extreme thermal constraint (WDM is the ultimate thermal constraint) |
| 03 (Bioag) | Distributed sensing → distributed computing; same NUCLEUS infrastructure |
| 04 (Sentinel) | WDM diagnostics (XRTS) are a form of environmental sensing under extreme conditions |
| 05 (Cross-species) | Multi-component plasma mixtures are the physics analog of multi-species communities |
| 06 (No-till) | Both apply physics principles (Anderson, transport) to applied problems using consumer compute |
6.1 neuralSpring Integration
neuralSpring contributes validated ML surrogates, spectral analysis, and df64-validated protein folding primitives directly to WDM science:
- df64 core streaming (Session 88): All 15 helixVision WGSL shaders evolved to the hotSpring/ToadStool three-zone pattern (f64 buffer I/O → df64 compute → f64 output). Validates that df64 generalizes from nuclear physics to ML workloads. Two precision tiers: arithmetic 3.6e-8 to 5.6e-7, transcendental 1.7e-4 to 3.4e-4. 37/37 GPU checks on RTX 4070
- WDM surrogates (nW-01 through nW-05): All 5 complete (Session 88+)
- nW-01: MLP transport surrogate (D*, η*, λ*) — 30/30 Rust checks
- nW-02: MLP EOS surrogate P(ρ,T), E(ρ,T) — 36/36 + 15/15 GPU checks
- nW-03: LSTM reservoir S(q,ω) peak predictor — 27/27 Rust checks, R²=0.98
- nW-04: MLP classical→WDM transfer learning — 6/6 Rust checks
- nW-05: ESN regime classifier (Classical/WDM/Degenerate) — 39/39 Rust checks, 96.5% accuracy
- Reservoir computing: nW-03 (LSTM) and nW-05 (ESN) demonstrate that reservoir computing (fixed random weights + ridge regression readout) is effective for WDM sequence analysis and regime classification
- Spectral analysis: neuralSpring’s
eigh_f64eigendecomposition andspectral_entropy(rewired tobarracuda::stats::shannon_from_frequenciesin Session 81) apply directly to plasma eigenmode analysis - Session 90 status: 669 lib tests, 179 binaries, 179/179 validators, 131+ named tolerances, 42 upstream rewires, 36 Python baselines (all deterministically seeded), 3,162+ total validation checks. nF-02 AlphaFold2 Evoformer block pipeline validated end-to-end. Phase B GPU gaps all closed: ODE batch integration, FST variance decomposition, introgression HMM chain
6.2 groundSpring V113 Integration
groundSpring V113 (GemmF64 transpose (Tikhonov KᵀK/KᵀG), RetryPolicy + CircuitBreaker, 4-format capability parsing, exit_code constants. V112: OrExit groundSpring’s Bazavov experiments (Exp 019-021) are direct lattice QCD contributions — the same physics that hotSpring simulates on GPU: Why this matters for WDM: hotSpring generates raw simulation data (trajectories, correlators, transport integrals). Converting that data into physical observables with rigorous uncertainty requires exactly the inverse problem and error estimation machinery that groundSpring validates. The freeze-out curve is a thermodynamic observable extracted from lattice data; the spectral function is a dynamic observable extracted from Euclidean correlators. groundSpring proves the extraction math works at benchmark precision before hotSpring applies it to production WDM data. Combined pipeline: hotSpring (GPU simulation) → groundSpring (inverse problem + error bars) → neuralSpring (surrogate acceleration). This is the full lattice QCD workflow, validated independently in three springs. groundSpring Experiments 025–027 provide the uncertainty budget that validates the numerical claims in this paper’s consumer-GPU WDM pipeline: Why this matters: Before claiming “$19 WDM on consumer GPU,” the uncertainty budget must prove that (a) f32 alone is insufficient (Exp 025), (b) the system sizes used actually converge (Exp 026), and (c) results are reproducible across GPU vendors (Exp 027). These three experiments convert the cost projection in Section 8 from speculation to validated science. The sovereign VFIO compute path on Volta (Titan V, GV100) achieved a critical milestone: PBDMA context load — the hardware PFIFO scheduler successfully loads our RAMFC channel context into PBDMA2. This is the first successful context load on the sovereign dispatch path without any kernel GPU driver. Three register-level discoveries made this possible: Current status: PBDMA2 has our context loaded with zero errors. The GPFIFO address, USERD address, channel ID, and all RAMFC fields are correctly mapped into PBDMA operational registers. One gap remains: the PBDMA does not read GP_PUT from USERD in system memory (suspected IOMMU DMA path issue). Hardware plan: GTX 1050 (headless display) + 2x Titan V — one on nouveau as an mmiotrace oracle, one on VFIO as the target. The oracle will capture nouveau’s complete PBDMA dispatch sequence for replication on the VFIO target. coralReef pin: Phase 10, Iteration 52+ (Experiment Q: VramFullDispatch) See: See Compare to institutional WDM allocation: ~100,000 GPU-hours at ~$1/GPU-hr = $100,000. Even at 100× less accuracy, the cost ratio is 5,000:1. This sub-thesis is not about competing with Frontier. It is about proving that the mathematical workflow is correct, portable, and accessible. If the physics runs correctly on one consumer GPU, it runs correctly on any consumer GPU. If it runs on any consumer GPU, it runs on every idle GPU. If it runs on every idle GPU, WDM simulation becomes a public utility rather than an institutional privilege. The shaders are the mathematics. The GPU is the substrate. The heat is the byproduct. The science is the point.6.3 groundSpring WDM Uncertainty Budget (Exp 025–027)
6.4 VFIO Sovereign Dispatch Breakthrough (Exp 058)
0x002638): Write BIT(runl_id) to force scheduler re-evaluation. Volta’s per-runlist preempt is at 0x002638, not the older per-channel preempt at 0x002634.0x002A00): After runlist submission, the scheduler fires PFIFO_INTR bit 30. Software MUST read 0x002A00 and write BIT(runl_id) to acknowledge. Without this, the scheduler will not dispatch channels to PBDMAs.RAMFC::SIGNATURE = 0xFACE. Used as a diagnostic (write 0xDEAD → observe error → confirm fresh context load).hotSpring/experiments/058_VFIO_PBDMA_CONTEXT_LOAD.md
7. Reproduction Targets
hotSpring/specs/PAPER_REVIEW_QUEUE.md Tier 4 for the full list of WDM reproduction targets (Papers 32-42).Priority Order
8. Cost Projection
Phase Hardware Time Cost Phase 1 (FPEOS reproduction) 1× RTX 4070 ~1 week ~$2 Phase 2 (WDM transport) 1× RTX 4070 ~2 weeks ~$5 Phase 3 (S(q,ω)) 1× RTX 4070 ~1 week ~$2 Phase 4 (distributed sweep) 2× GPU ( NUCLEUS) ~1 month ~$10 Total ~2 months ~$19
9. The Broader Vision