Sovereign Lattice QCD — Pure Rust Gauge Theory on Consumer GPUs

A deployable lattice QCD stack replacing QUDA/MILC/Chroma C++/CUDA with pure Rust + WGSL — guideStone-certified, MILC-compatible output, runs on any Vulkan GPU.

Organization: sporeGarden (product name TBD)
License: scyBorg (AGPL-3.0-or-later + ORC + CC-BY-SA 4.0)
Status: Engine validated, product packaging in development

What It Is

A sovereign lattice QCD stack — pure Rust, zero C dependencies, no CUDA — that produces MILC-compatible gauge configurations on consumer GPUs. The physics engine already exists across hotSpring, barraCuda, coralReef, and ToadStool. The product is the packaging: a guideStone-certified deployment artifact that a lattice physicist can scp to any machine and run.

The established lattice QCD toolchain — QUDA (C++/CUDA, GPU), MILC (C, CPU), Chroma (C++, JeffersonLab) — requires CUDA, vendor SDKs, MPI, and HPC cluster access. This product replaces all of it with a single static binary.

What Already Works

The physics engine is validated. The springs are the acceptance tests.

Capability	Primal	Evidence
Wilson gauge action + SU(3)	barraCuda	Plaquette at beta=6.0: 0.5929 (literature ~0.594)
Gradient flow (W6, W7, CK4, LSCFRK3)	barraCuda	Convergence orders 2.06/2.08/2.11, LSCFRK3 coefficients derived from first principles
Staggered fermions + HMC	barraCuda	Dynamical N_f=4 adaptive Omelyan — in progress
f64 on consumer GPUs	barraCuda + coralReef	Vulkan SHADER_F64: native f64 at 1:2 throughput on RTX 4070
Sovereign GPU compiler	coralReef	WGSL to native GPU binary — no LLVM, no NVCC, no vendor SDK
Hardware dispatch	ToadStool	NVIDIA SM70-SM89, AMD RDNA2 (GFX1030), auto-detection
Cross-substrate parity	guideStone	40/40 bit-identical across 5 substrates (x86_64, aarch64, NVIDIA, AMD, CPU-only)
guideStone certification	hotSpring-guideStone-v0.7.0	59/59 checks, 3 published papers reproduced, self-leveling benchmark

Three published papers independently validated by the original author (TC Chuna, MSU/Murillo Group):

Paper	Citation	Result
Gradient flow	Bazavov & Chuna, arXiv:2101.05320	14/14 checks — integrators, t0/w0 scale, convergence
BGK dielectric	Chuna & Murillo, PRE 111, 035206	25/25 checks — Mermin, f-sum, DSF, conductivity
Kinetic-fluid coupling	Haack et al., JCP (2024)	20/20 checks — BGK relaxation, Sod shock, coupled interface

What the Product Adds

The engine does the physics. The product packages it for lattice physicists.

Feature	What It Does
ILDG-compatible output	Gauge configurations in the International Lattice Data Grid format — directly consumable by MILC, Chroma, and existing analysis tools
Measurement pipeline	Plaquette, Polyakov loop, topological charge, Wilson flow observables — the standard lattice measurements
Self-leveling benchmark	`./hotspring benchmark` characterizes unknown hardware against published lattice results — the physics is the benchmark
Deploy Graph composition	hotSpring + barraCuda + coralReef composed via biomeOS as a single BYOB Niche
Portable artifact	Static musl binary, dual-arch (x86_64 + aarch64), OCI container, USB-deployable

How It Composes

Layer	What	Primal
Math	WGSL f64 shaders: gauge action, force, HMC, gradient flow, spectral	barraCuda
Compilation	WGSL to native GPU binary (NVIDIA + AMD)	coralReef
Dispatch	Hardware discovery, GPU scheduling, workload routing	ToadStool
Validation	hotSpring — the spring that proves the physics	hotSpring

Why It Matters

	QUDA + MILC + Chroma	This product
Language	C, C++, Fortran	Rust
GPU backend	CUDA (NVIDIA only)	Vulkan / WGSL (NVIDIA, AMD, Intel)
Precision	f64 on compute-class only	f64 on consumer GPUs ($600 RTX 4070)
Dependencies	CUDA SDK, MPI, autoconf, LLVM	Zero (static binary)
Installation	Days (build MILC, QUDA, configure MPI, test)	Minutes (`tar xf && ./hotspring validate`)
Cost	HPC cluster allocation	$4K basement workstation
Deployment	Cluster job scripts	USB drive
Memory safety	Manual C/C++	Compiler-guaranteed

NVIDIA’s CUDA pricing model throttles consumer f64 to 1:64 throughput to protect the compute-class product line. Vulkan’s SHADER_F64 extension exposes the native 1:2 ratio. The $600 RTX 4070 does the same f64 physics as a $10,000 A100 — CUDA just doesn’t let you see it.

Current Status

Engine: Validated (59/59 checks, 3 papers, cross-vendor GPU parity)
ILDG output format: In development
Measurement pipeline: Plaquette and flow observables working; Polyakov loop and topological charge next
Product packaging: Pending (product name TBD, sporeGarden repo TBD)

See also: guideStone for the verification class, Paper 10 — First Dynamical QCD on Consumer GPU, Paper 07 — Sovereign WDM Simulation, Primal Catalog for barraCuda and coralReef details.