Cross-Substrate Validation

Five substrates, 40/40 bit-identical — the evidence that physics does not depend on instruction set, C library, GPU vendor, or operating system.

The Claim

A guideStone-certified artifact produces the same physics on any hardware it runs on. Not “approximately the same.” Not “within a few ULP.” For the core validation suite: bit-identical.

This is a strong claim. This page presents the evidence.


The Five Substrates

The first guideStone artifact — hotSpring-guideStone-v0.7.0 — was validated across five substrates chosen to maximize diversity along every axis that could affect floating-point results:

#SubstrateArchC LibraryGPUKernel
1Ubuntu 22.04x86_64glibc 2.35None (CPU only)5.15
2Ubuntu 22.04x86_64glibc 2.35NVIDIA RTX 30905.15
3Ubuntu 22.04x86_64glibc 2.35AMD RX 6950 XT5.15
4Alpine 3.19x86_64musl 1.2.4None (CPU only)6.6
5Ubuntu 22.04aarch64glibc 2.35None (qemu-user)5.15

Dimensions varied:

  • Instruction set: x86_64 vs aarch64
  • C library: glibc vs musl (though the binary is statically linked, this tests that no libc behavior leaks through)
  • GPU vendor: NVIDIA vs AMD vs no GPU
  • GPU compiler: coralReef SASS (SM86) vs RDNA2 (GFX1030)
  • Kernel version: 5.15 vs 6.6

The Results

Per-Substrate Check Results

SubstrateChecksResult
Ubuntu x86_64, CPU59/59PASS
Ubuntu x86_64, RTX 309059/59PASS
Ubuntu x86_64, RX 6950 XT59/59PASS
Alpine x86_64, CPU59/59PASS
Ubuntu aarch64, CPU59/59PASS

Cross-Substrate Comparison

After all five substrates passed independently, outputs were compared pairwise. For each of the 40 observable quantities (plaquettes, energies, correlation functions, flow scales):

40/40 bit-identical across all five substrates.

Not “within tolerance.” Not “within 1 ULP.” The IEEE 754 double-precision bit patterns are the same bytes on every substrate.


Why Bit-Identity Is Possible

Bit-identical results across architectures are not the default in scientific computing. Most HPC codes accept “within tolerance” because floating-point non-associativity, FMA contraction, and thread scheduling make exact reproducibility impractical.

guideStone achieves it through four mechanisms:

1. Canonical Reduction Order

Parallel reductions (summing an array across GPU threads) use a fixed binary tree structure rather than hardware-dependent scheduling. This eliminates the primary source of floating-point non-determinism in GPU computation.

barraCuda WGSL shaders implement this explicitly. The reduction tree is part of the specification, not an implementation detail.

2. Explicit FMA Policy

Fused multiply-add (FMA) changes results by absorbing the intermediate rounding. coralReef emits FMA instructions with documented contraction semantics. The same FMA policy applies whether the target is NVIDIA SASS or AMD GFX1030.

3. Pure Rust Arithmetic

The CPU path uses Rust’s f64 arithmetic with explicit operation ordering. No LAPACK, no BLAS, no vendor math library. The same Rust source compiles to both x86_64 and aarch64 with identical semantics because there is no C library in the hot path.

4. Tolerance Decomposition

groundSpring decomposes the uncertainty budget for every observable. When the dominant uncertainty is gauge sampling variance (statistical), the deterministic tolerance is set far below it. Bit-identity is achievable because the numerical tolerance headroom is orders of magnitude larger than the floating-point representation differences.


What Bit-Identity Does Not Cover

The 40/40 bit-identical result applies to the core validation observables — quantities computed from reference gauge configurations with fixed random seeds and deterministic integration paths.

Quantities that involve:

  • Monte Carlo sampling with different random seeds — statistically consistent, not bit-identical
  • Iterative solvers with hardware-dependent convergence — results agree within named tolerance, not bit-identical
  • Timing-dependent operations — wall time varies, physics does not

The distinction is precise: deterministic computations (same input, same algorithm, same operation order) are bit-identical. Stochastic computations (sampling, random initialization) are statistically consistent within derived tolerances.


The Implication

When a PI runs ./hotspring validate on their laptop and gets 59/59 PASS, they know:

  1. The physics on their machine matches the physics on every other machine that has ever run this artifact
  2. The match is not approximate — it is exact for deterministic quantities
  3. The tolerances for stochastic quantities are derived, not guessed, and the dominant uncertainty source is named
  4. No vendor SDK, no institutional license, no cloud subscription was required to achieve this

The computation is the proof. The substrate is irrelevant. This is what guideStone means.