Sovereign Lattice QCD — Pure Rust Gauge Theory on Consumer GPUs
A deployable lattice QCD stack replacing QUDA/MILC/Chroma C++/CUDA with pure Rust + WGSL — guideStone-certified, MILC-compatible output, runs on any Vulkan GPU.
Organization: sporeGarden (product name TBD)
License: scyBorg (AGPL-3.0-or-later + ORC + CC-BY-SA 4.0)
Status: Engine validated, product packaging in development
What It Is
A sovereign lattice QCD stack — pure Rust, zero C dependencies, no CUDA — that produces MILC-compatible gauge configurations on consumer GPUs. The physics engine already exists across hotSpring, barraCuda, coralReef, and ToadStool. The product is the packaging: a guideStone-certified deployment artifact that a lattice physicist can scp to any machine and run.
The established lattice QCD toolchain — QUDA (C++/CUDA, GPU), MILC (C, CPU), Chroma (C++, JeffersonLab) — requires CUDA, vendor SDKs, MPI, and HPC cluster access. This product replaces all of it with a single static binary.
What Already Works
The physics engine is validated. The springs are the acceptance tests.
| Capability | Primal | Evidence |
|---|---|---|
| Wilson gauge action + SU(3) | barraCuda | Plaquette at beta=6.0: 0.5929 (literature ~0.594) |
| Gradient flow (W6, W7, CK4, LSCFRK3) | barraCuda | Convergence orders 2.06/2.08/2.11, LSCFRK3 coefficients derived from first principles |
| Staggered fermions + HMC | barraCuda | Dynamical N_f=4 adaptive Omelyan — in progress |
| f64 on consumer GPUs | barraCuda + coralReef | Vulkan SHADER_F64: native f64 at 1:2 throughput on RTX 4070 |
| Sovereign GPU compiler | coralReef | WGSL to native GPU binary — no LLVM, no NVCC, no vendor SDK |
| Hardware dispatch | ToadStool | NVIDIA SM70-SM89, AMD RDNA2 (GFX1030), auto-detection |
| Cross-substrate parity | guideStone | 40/40 bit-identical across 5 substrates (x86_64, aarch64, NVIDIA, AMD, CPU-only) |
| guideStone certification | hotSpring-guideStone-v0.7.0 | 59/59 checks, 3 published papers reproduced, self-leveling benchmark |
Three published papers independently validated by the original author (TC Chuna, MSU/Murillo Group):
| Paper | Citation | Result |
|---|---|---|
| Gradient flow | Bazavov & Chuna, arXiv:2101.05320 | 14/14 checks — integrators, t0/w0 scale, convergence |
| BGK dielectric | Chuna & Murillo, PRE 111, 035206 | 25/25 checks — Mermin, f-sum, DSF, conductivity |
| Kinetic-fluid coupling | Haack et al., JCP (2024) | 20/20 checks — BGK relaxation, Sod shock, coupled interface |
What the Product Adds
The engine does the physics. The product packages it for lattice physicists.
| Feature | What It Does |
|---|---|
| ILDG-compatible output | Gauge configurations in the International Lattice Data Grid format — directly consumable by MILC, Chroma, and existing analysis tools |
| Measurement pipeline | Plaquette, Polyakov loop, topological charge, Wilson flow observables — the standard lattice measurements |
| Self-leveling benchmark | ./hotspring benchmark characterizes unknown hardware against published lattice results — the physics is the benchmark |
| Deploy Graph composition | hotSpring + barraCuda + coralReef composed via biomeOS as a single BYOB Niche |
| Portable artifact | Static musl binary, dual-arch (x86_64 + aarch64), OCI container, USB-deployable |
How It Composes
| Layer | What | Primal |
|---|---|---|
| Math | WGSL f64 shaders: gauge action, force, HMC, gradient flow, spectral | barraCuda |
| Compilation | WGSL to native GPU binary (NVIDIA + AMD) | coralReef |
| Dispatch | Hardware discovery, GPU scheduling, workload routing | ToadStool |
| Validation | hotSpring — the spring that proves the physics | hotSpring |
Why It Matters
| QUDA + MILC + Chroma | This product | |
|---|---|---|
| Language | C, C++, Fortran | Rust |
| GPU backend | CUDA (NVIDIA only) | Vulkan / WGSL (NVIDIA, AMD, Intel) |
| Precision | f64 on compute-class only | f64 on consumer GPUs ($600 RTX 4070) |
| Dependencies | CUDA SDK, MPI, autoconf, LLVM | Zero (static binary) |
| Installation | Days (build MILC, QUDA, configure MPI, test) | Minutes (tar xf && ./hotspring validate) |
| Cost | HPC cluster allocation | $4K basement workstation |
| Deployment | Cluster job scripts | USB drive |
| Memory safety | Manual C/C++ | Compiler-guaranteed |
NVIDIA’s CUDA pricing model throttles consumer f64 to 1:64 throughput to protect the compute-class product line. Vulkan’s SHADER_F64 extension exposes the native 1:2 ratio. The $600 RTX 4070 does the same f64 physics as a $10,000 A100 — CUDA just doesn’t let you see it.
Current Status
- Engine: Validated (59/59 checks, 3 papers, cross-vendor GPU parity)
- ILDG output format: In development
- Measurement pipeline: Plaquette and flow observables working; Polyakov loop and topological charge next
- Product packaging: Pending (product name TBD, sporeGarden repo TBD)
See also: guideStone for the verification class, Paper 10 — First Dynamical QCD on Consumer GPU, Paper 07 — Sovereign WDM Simulation, Primal Catalog for barraCuda and coralReef details.