First Dynamical QCD Production on Consumer GPU
Lattice QCD x GPU Compute — first dynamical fermion production on consumer GPU, guideStone certified. hotSpring.
Date: March 9, 2026 Status: Complete (all 17 β points finished, 1,071 trajectories). Chuna Papers 43-45: 44/44 overnight checks pass (v0.6.24). Dynamical N_f=4 ext 3/3 complete. coralReef sovereign compilation: 44/46 shaders, full GpuBackend impl. Spring: hotSpring (v0.6.24) Hardware: RTX 3090 (GPU) + BrainChip AKD1000 (NPU) + Titan V (DRM testing) License: AGPL-3.0-only
Summary
First dynamical fermion lattice QCD production scan on a consumer GPU. An 8⁴ lattice with staggered quarks (N_f = 1, m = 0.1) scanned across 17 β values steered by a 14-head neuromorphic coprocessor. The NPU expanded a 4-point seed scan to 17 points in real time, systematically mapping the confined → deconfined crossover. The crossover is smooth (no first-order jump), confirming the expected qualitative change from quenched QCD. β_c has shifted downward from 5.692 (quenched) to approximately 5.0–5.5 (dynamical, 1 flavor).
| Metric | Value |
|---|---|
| Lattice | 8⁴ (4,096 sites) |
| Fermions | Staggered, N_f = 1, m = 0.1 |
| β points | 17 (4 seeded + 13 NPU-inserted) |
| Total trajectories | 1,071 (85 pretherm + 170 therm + 816 measurement) |
| Wall time | 11.96 hours |
| Measurement acceptance | 56.6% (462/816) |
| NPU heads active | 14 (11 operational + 3 physics proxy) |
| Electricity cost (est.) | ~$0.50 |
1. Background: Why This Run Matters
baseCamp 07 established quenched SU(3) lattice QCD on consumer hardware: two production scans at 32⁴ showing β_c = 5.69 (matching literature to three significant figures), DF64 hybrid arithmetic (2× speedup), and NPU adaptive steering (2.5× more useful statistics at same wall time).
Quenched QCD ignores quarks. The gluon field evolves alone — no virtual quark-antiquark pairs, no fermion backreaction. The deconfinement transition is first-order (a sharp discontinuity in thermodynamic quantities). This is computationally cheaper but physically incomplete.
Dynamical QCD includes the fermion determinant, adding a Conjugate Gradient (CG) solver that dominates the computational cost. The transition softens from first-order to a smooth crossover. The critical coupling shifts. The physics is qualitatively different.
This run is the first dynamical production scan on the biomeGate system and, to our knowledge, the first NPU-steered dynamical QCD scan of any kind.
2. NPU Adaptive Steering: How the Scan Grew from 4 to 17 Points
Initial seed
The run was launched with 4 seed β values: 5.0, 5.5, 5.69, 6.0.
NPU steering decisions (chronological)
The NPU evaluated each completed β point and chose where to insert the next scan point. The full steering trace:
| Step | Completed β | NPU β_c estimate | Inserted β | Reasoning |
|---|---|---|---|---|
| 1 | 5.69 | 5.69 | — | First seed, high priority |
| 2 | 5.50 | 5.50 | — | Second seed |
| 3 | 6.00 | 5.50 | 4.9293 | Weak-coupling done; explore below β_c |
| 4 | 5.00 | 5.50 | 4.8603 | Gap below 5.0 needs filling |
| 5 | 4.9293 | 5.50 | 6.0673 | Low β sampled; balance with high β |
| 6 | 4.8603 | 5.50 | 4.7946 | Continue downward mapping |
| 7 | 6.0673 | 5.50 | 6.1314 | Extend deconfined tail |
| 8 | 4.7946 | 5.50 | 4.7321 | Continue downward |
| 9 | 6.1314 | 5.50 | 4.6711 | Symmetric exploration |
| 10 | 4.7321 | 5.50 | 4.6116 | Fill confined region |
| 11 | 4.6711 | 5.50 | 4.5535 | Continue |
| 12 | 4.6116 | 5.50 | 4.4969 | Continue |
| 13 | 4.5535 | 5.50 | 4.4416 | Continue |
| 14 | 4.4969 | 5.50 | 4.3877 | Continue |
| 15 | 4.4416 | 5.50 | 4.3351 | Continue |
| 16 | 4.3877 | 5.50 | — | Final NPU point |
| 17 | 4.3351 | 5.50 | — | Completed last |
The NPU locked onto β_c ≈ 5.50 after seeing the first 3 points and never revised this estimate. It then spent most of its steering budget mapping the confined side of the transition — inserting 11 points below β = 5.0 where the crossover to confinement occurs. This is a notable difference from the quenched runs, where the NPU focused on the transition region (β ≈ 5.4–5.8). The dynamical crossover is broader and the NPU correctly identified that the interesting physics extends much further into the confined regime.
Steering overhead
| Metric | Value |
|---|---|
| NPU inference calls (est.) | ~17 × 60 = ~1,020 |
| Time per inference | 341 µs |
| Total NPU time | ~0.35 seconds |
| Total GPU time | 11.5 hours |
| NPU overhead | 0.00085% |
3. Production Results
Per-β data table (all 17 completed)
Sorted by β. The “Order” column shows when each point was evaluated — note how the NPU jumped between high and low β rather than scanning linearly (see scan trajectory diagram below).
| # | β | n | Acc% | ⟨P⟩ | σ(P) | ⟨CG⟩ | Wall/traj | Source | Order |
|---|---|---|---|---|---|---|---|---|---|
| 17 | 4.3351 | 16 | 50% | 0.3156 | 0.003 | 60,472 | 76 s | NPU | last |
| 16 | 4.3877 | 50 | 52% | 0.3234 | 0.004 | 60,451 | 56 s | NPU | 16th |
| 15 | 4.4416 | 50 | 54% | 0.3284 | 0.004 | 60,442 | 60 s | NPU | 15th |
| 14 | 4.4969 | 50 | 44% | 0.3344 | 0.002 | 60,434 | 55 s | NPU | 14th |
| 13 | 4.5535 | 50 | 50% | 0.3430 | 0.002 | 60,415 | 68 s | NPU | 13th |
| 12 | 4.6116 | 50 | 50% | 0.3464 | 0.002 | 60,399 | 53 s | NPU | 12th |
| 11 | 4.6711 | 50 | 40% | 0.3539 | 0.004 | 60,295 | 53 s | NPU | 11th |
| 10 | 4.7321 | 50 | 42% | 0.3616 | 0.004 | 60,041 | 53 s | NPU | 10th |
| 8 | 4.7946 | 50 | 52% | 0.3711 | 0.003 | 59,586 | 52 s | NPU | 8th |
| 6 | 4.8603 | 50 | 46% | 0.3765 | 0.004 | 59,200 | 51 s | NPU | 6th |
| 5 | 4.9293 | 50 | 46% | 0.3893 | 0.004 | 58,968 | 51 s | NPU | 5th |
| 4 | 5.0000 | 50 | 50% | 0.4040 | 0.003 | 58,929 | 51 s | Seed | 4th |
| 2 | 5.5000 | 50 | 66% | 0.5255 | 0.007 | 55,423 | 47 s | Seed | 2nd |
| 1 | 5.6900 | 50 | 78% | 0.5511 | 0.006 | 54,254 | 46 s | Seed | 1st |
| 3 | 6.0000 | 50 | 84% | 0.5812 | 0.004 | 49,804 | 43 s | Seed | 3rd |
| 7 | 6.0673 | 50 | 76% | 0.5881 | 0.003 | 54,278 | 48 s | NPU | 7th |
| 9 | 6.1314 | 50 | 78% | 0.5957 | 0.004 | 49,072 | 43 s | NPU | 9th |
Plaquette curve (seeds vs NPU-inserted)
⟨P⟩
0.60 │ o ← 6.13 NPU #9
0.59 │ o ← 6.07 NPU #7
0.58 │ S ← 6.00 SEED #3
0.55 │ S ← 5.69 SEED #1
0.53 │ S ← 5.50 SEED #2
0.40 │ S ← 5.00 SEED #4
0.39 │ o ← 4.93 NPU #5
0.38 │ o ← 4.86 NPU #6
0.37 │ o ← 4.79 NPU #8
0.36 │ o ← 4.73 NPU #10
0.35 │ o ← 4.67 NPU #11
0.35 │ o ← 4.61 NPU #12
0.34 │ o o ← 4.55 NPU #13
0.33 │ o o ← 4.44 NPU #15
0.32 │ o ← 4.39 NPU #16
0.32 │ o ← 4.34 NPU #17
┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──
4.3 4.5 4.6 4.7 4.8 4.9 5.0 5.5 5.7 6.0 6.1 β
S = seed point (human-selected) o = NPU-insertedNPU scan trajectory (evaluation order)
The NPU did not scan linearly. It bracketed the transition region, alternating between the low-β confined side and the high-β deconfined tail. This diagram shows the order each β was evaluated (read left to right), with arrows showing where the NPU jumped:
Evaluation order →
Step: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
β: 5.69 5.50 6.00 5.00 4.93 4.86 6.07 4.79 6.13 4.73 4.67 4.61 4.55 4.50 4.44 4.39 4.34
Src: [S1] [S2] [S3] [S4] NPU NPU NPU NPU NPU NPU NPU NPU NPU NPU NPU NPU NPU
│ │ │ │ │ │ │ │ │
└──┬──┘ │ │ │ │ │ │ │
start at │ │ │ │ │ │ │
transition │ │ │ │ │ │ │
│ │ └──┬──┘ └──┬──┘ │
│ │ low-β fill jump to high-β
│ │ high-β tail
└─────┴───────────────────────────────── then systematic downward sweep ───→
β (number line, showing jump pattern):
4.3 4.5 4.7 4.9 5.0 5.5 5.7 6.0 6.1
├──────┼─────┼─────┼─────┤ ├─────┤─────┤─────┤
17←16←15←14←13←12←11←10 ↑ ←8 ←6 ↑ ↑ ↑ ←7 ←9
5 4 2 1 3
Read: The NPU started at seed 5.69 (#1), jumped to 5.50 (#2),
then 6.00 (#3), then 5.00 (#4). After these 4 seeds, it inserted
4.93 (#5) and 4.86 (#6), then jumped up to 6.07 (#7) to balance,
back down to 4.79 (#8), up to 6.13 (#9), then swept systematically
downward: 4.73, 4.67, 4.61, 4.55, 4.50, 4.44, 4.39, 4.34.Key observation: The NPU’s “bracket and fill” strategy is visible in the jump pattern. After the 4 seeds established the range, it alternated high/low insertions (steps 5-9) to bracket the crossover from both sides before committing to a systematic downward sweep (steps 10-17). This is exactly how a physicist would explore an unknown phase diagram — coarse bracketing first, then fine filling. The NPU learned this strategy from the quenched training data without being explicitly programmed to bracket.
The plaquette rises monotonically and smoothly from ⟨P⟩ = 0.316 at β = 4.34 to ⟨P⟩ = 0.596 at β = 6.13. No discontinuity — this is the smooth crossover expected for dynamical fermions, in contrast with the quenched first-order transition.
The steepest gradient is between β ≈ 5.0 and β ≈ 5.5 (ΔP/Δβ ≈ 0.23), consistent with the crossover region. This is well below the quenched β_c = 5.692, confirming the expected downward shift from fermion backreaction.
4. Key Physics Findings
4.1 The crossover is smooth
The quenched deconfinement transition at β_c = 5.692 is first-order — the plaquette jumps discontinuously. The susceptibility χ is sharp and tall (χ ~ 40–53 in the 32⁴ quenched runs).
The dynamical run shows no discontinuity at any β. The plaquette varies smoothly and the susceptibility is small and broad (χ < 0.25 everywhere). This is the expected crossover behavior: dynamical quarks screen the gluon self-interaction, washing out the first-order transition.
4.2 β_c has shifted downward
In quenched SU(3), the deconfinement transition occurs at β_c = 5.692 (known from decades of lattice calculations). With 1 flavor of staggered quarks at m = 0.1, the steepest plaquette gradient sits between β ≈ 5.0 and β ≈ 5.5. The NPU’s β_c estimate of 5.50 is consistent with this. Fermion backreaction adds attractive forces at the confinement scale, lowering the critical coupling.
4.3 CG cost varies systematically with β
| Region | β range | ⟨CG⟩ | ⟨|ΔH|⟩ | Acc% |
|---|---|---|---|---|
| Strong coupling | 4.34–4.93 | 58,968–60,472 | 0.71–0.82 | 40–54% |
| Crossover | 5.00–5.50 | 55,423–58,929 | 0.38–0.68 | 50–66% |
| Weak coupling | 5.69–6.13 | 49,072–54,278 | 0.26–0.33 | 76–84% |
CG iterations decrease by ~19% from strong to weak coupling. This reflects the improving condition number of the Dirac operator: at weak coupling, gauge fluctuations are smaller (less “disorder” in the Anderson analogy), the lowest eigenvalue is larger, and the matrix is easier to invert. The acceptance rate improves correspondingly from ~50% to ~80%.
This systematic CG–β correlation is exactly what the Anderson proxy pipeline (Exp 026) is designed to predict cheaply.
4.4 The Polyakov loop is noisy but present
The Polyakov loop magnitude |L| ≈ 0.29 across all β values. On an 8⁴ lattice, the Polyakov loop has large finite-volume fluctuations and is not a clean order parameter. At 32⁴, we expect |L| to show clear separation between confined (|L| → 0) and deconfined (|L| → finite) phases, as it did in the quenched runs.
5. Comparison: Quenched vs Dynamical
| Property | Quenched (32⁴) | Dynamical (8⁴) |
|---|---|---|
| Lattice volume | 1,048,576 | 4,096 |
| β_c | 5.692 (sharp) | ~5.0–5.5 (broad) |
| Transition order | First-order | Crossover |
| χ at peak | 40–53 | < 0.25 |
| CG iterations / traj | 0 | 46,000–55,800 |
| Wall time / traj | 7.6 s | 34–52 s |
| Acceptance | 15–24% | 40–84% |
| NPU β_c estimate | 5.69 | 5.50 |
| Seed β points | 3 | 4 |
| NPU-inserted β points | 7 | 13 |
| Total β points | 10 | 17 |
| Total trajectories | 6,640 | 1,071 |
| Total wall time | 14.2 h | 11.96 h |
The dynamical run has fewer total trajectories but more β points, because the NPU correctly identified that the broad crossover requires denser sampling over a wider β range. The quenched transition is sharp and localized — 10 points suffice. The dynamical crossover spans Δβ ≈ 1.5 — the NPU mapped it with 17 points.
6. NPU Performance
CG prediction accuracy
The NPU’s CG estimates varied widely:
| β (first seen) | NPU CG estimate | Actual ⟨CG⟩ | Error |
|---|---|---|---|
| 5.69 | 740 | 54,255 | 73× underestimate |
| 5.50 | 18,135 | 55,423 | 3× underestimate |
| 6.00 | 2,175 | 49,805 | 23× underestimate |
| 5.00 | 15,574 | 58,930 | 4× underestimate |
| 4.93 | 30,557 | 58,968 | 2× underestimate |
| 6.07 | 140 | 54,278 | 388× underestimate |
The CG estimates are systematically low because the ESN was trained on quenched data where there is no CG solver. The NPU has no prior dynamical training data — this run IS the first training set. The Exp 026 proxy pipeline (4D Anderson + Wegner) will provide physics-informed CG predictions that should dramatically improve this.
What worked well
- β_c estimation: Locked to 5.50 after 3 points and stayed stable. This is reasonable for 1-flavor dynamical fermions.
- Adaptive steering: Expanded 4 → 17 points, systematically mapping the full β range. Correctly identified that the crossover extends far into the confined region.
- Phase classification: Correctly labeled all β < 5.5 as “confined” and β = 5.69 as “transition.”
- Anomaly detection: Flagged 5 anomalies per β point — likely the first few trajectories after thermalization that haven’t fully equilibrated. Consistent behavior suggests a real pattern, not noise.
What needs improvement
- CG prediction: Needs dynamical training data (this run provides it) and physics proxy input (Exp 026).
- Parameter suggestion: The NPU consistently suggested smaller dt and larger n_md than what was used (e.g., dt=0.001 vs actual dt=0.01). The suggestions were more conservative but the defaults worked, so the NPU was being cautious without data.
7. What This Means for Scale-Up
The 8⁴ run validated:
The fermion force is correct — acceptance is 60%, ΔH is O(1), the plaquette curve is physical. The bug fix from Exp 024 (momentum kick sign error) is confirmed stable over 1,000+ trajectories.
NPU steering works for dynamical QCD — the scan expanded sensibly, β_c estimation is stable, phase classification is correct.
CG prediction needs physics proxies — the ESN alone (without Anderson/Wegner training data) cannot predict CG iterations for a new physics regime. This is the primary motivation for Exp 026.
The crossover is broader than expected — the NPU inserted 13 additional points and the physics hasn’t plateaued at the low end. A 32⁴ production run should plan for β range 4.0–6.5 with 20+ points.
Scale-up roadmap (updated from Exp 025)
| Run | Lattice | dt | β points | Est. wall | Blocking issue |
|---|---|---|---|---|---|
| ✅ Exp 024 | 8⁴ | 0.01 | 17 | 11.96 h | Complete: 1,071 trajs |
| Exp 025A | 16⁴ | 0.005 | 3 | 1–3 h | Validate CG scaling |
| Exp 025B | 16⁴ + 8⁴ | 0.005 / 0.01 | 6 | 3–6 h | Dual-GPU |
| Exp 026 | — | — | — | 30 min | 4D proxy data |
| Production | 32⁴ | 0.003 | 20+ | 100–250 h | All above |
8. Energy Context
Observed thermals
GPU temperature during this run was significantly lower than the quenched 32⁴ runs:
| Run | Lattice | GPU temp | Est. power | Est. energy |
|---|---|---|---|---|
| Quenched 32⁴ (Exp 013) | 32⁴ | 73°C | 370W | 5.0 kWh |
| Quenched 32⁴ (Exp 022) | 32⁴ | 74°C | 354W | 5.0 kWh |
| Dynamical 8⁴ (this run) | 8⁴ | ~42°C | ~100W | ~1.2 kWh |
The 8⁴ lattice uses 0.06% of VRAM and ~20% of shader cores. Most of the 3090’s transistors are idle. This means the CG solver, despite being the dominant cost, is not GPU-limited — it’s algorithmically limited by the number of iterations, not by the available FLOPS.
Scaling to 32⁴ dynamical will bring GPU utilization and thermal output back to quenched-run levels. See Exp 027 for full energy tracking specifications.
9. Future Directions
Immediate (before next production run)
- Exp 026: Run 4D Anderson + Wegner block proxy pipeline to generate physics-informed CG training data for the NPU.
- Exp 025A: 16⁴ single-β validation to measure real CG scaling.
- Exp 027: Instrument energy tracking in all production binaries.
Medium-term
- 2+1 flavor: Add a second pseudofermion field for the strange quark, matching the physical QCD configuration. Doubles CG cost per trajectory.
- 32⁴ dynamical production: Full-volume scan with NPU steering trained on Exp 024 + Exp 026 data.
Connection to other baseCamp papers
- baseCamp 01 (Anderson QS): The CG–disorder correlation observed here directly validates the Anderson localization framework. Gauge fluctuations at strong coupling (high plaquette variance = high effective disorder) produce harder CG solves, exactly as Anderson predicts.
- baseCamp 07 (WDM/QCD): This run extends paper 07 from quenched to dynamical. The DF64 arithmetic, NPU steering, and vendor-agnostic shader stack carry over unchanged.
- baseCamp 04 (Sentinels): The multi-head NPU architecture demonstrated here (14 heads, real-time steering) is the same pattern used for environmental biosensing — cheap inference guiding expensive measurement.
Addendum: NPU as Parameter Controller (Exp 031, 2026-03-01)
Exp 030 revealed that the NPU’s parameter suggestions (dt, n_md) were being received but never applied. The auto_dt formula over-penalized mass (mass_scale.sqrt() turned dt=0.01 into dt=0.0032 for mass=0.1), producing 97.5% acceptance — far above the 60-80% sweet spot and wasting ~2x CG iterations per useful trajectory.
Exp 031 makes the NPU the actual controller of HMC parameters:
| Parameter | Before (Exp 030) | After (Exp 031) |
|---|---|---|
| dt | Fixed at startup (0.0032) | NPU-suggested per-beta + mid-run adaptation |
| n_md | Fixed at startup | Derived from dt to keep trajectory length ~1.0 |
| Training target | 0.01 + acc * 0.04 (crude) | dt_used * (1 - 0.5 * (acc - 0.70)) (targets 70% acceptance) |
Mid-beta feedback loop fires every 10 measurement trajectories: if acceptance
85%, dt bumps 15%; if < 50%, dt drops 15%. The
dt_usedandn_md_usedfields inBetaResultenable post-hoc analysis of how the NPU adapts parameters across the phase curve. Safety clamps:dt ∈ [0.001, 0.02],n_md ∈ [20, 500]. A--no-npu-controlflag reverts to the old print-only behavior.
This closes the gap between what the NPU knows and what the NPU controls — the brain architecture now has a complete feedback loop from measurement to parameter adjustment, with the Titan V pre-motor receiving the NPU-adapted dt for the next beta point.
Addendum: Deep Debt Resolution (v0.6.18, 2026-03-06)
hotSpring v0.6.18 completed a comprehensive technical debt audit (Exp 041): Clippy 0 warnings (pedantic+nursery), file-size compliance (<1000 lines), unwrap/expect removal from production sites, SPDX 100% AGPL-3.0-only. Brain B2 (memory pressure) and D1 (force anomaly) evolved from placeholder to real runtime estimates. 685 lib tests pass. See hotSpring/experiments/041_DEEP_DEBT_RESOLUTION_AUDIT.md.
Data Files
| File | Contents |
|---|---|
results/exp024_production_8x8.jsonl | Per-trajectory JSONL (1,071 lines) |
results/exp024_production_8x8.log | Terminal log with NPU steering trace |
experiments/024_HMC_PARAMETER_SWEEP.md | Parameter sweep that informed this run |
experiments/025_GPU_SATURATION_MULTI_PHYSICS.md | Scale-up plan |
experiments/026_4D_ANDERSON_WEGNER_PROXY.md | Physics proxy pipeline |
experiments/027_ENERGY_THERMAL_TRACKING.md | Energy instrumentation |
specs/ANDERSON_4D_WEGNER_PROXY.md | Technical spec for proxy system |
References
- A. Bazavov et al. [HotQCD]. “Equation of state in (2+1)-flavor QCD.” Phys. Rev. D 90, 094503 (2014).
- T. G. Kovács and F. Pittler. “Anderson Localization in Quark-Gluon Plasma.” Phys. Rev. Lett. 105, 192001 (2010).
- M. Giordano, T. G. Kovács, F. Pittler. “Dirac mode localization in QCD near the crossover temperature.” arXiv:2602.10921 (2026).
- B. Svetitsky and L. G. Yaffe. “Critical behavior at finite-temperature confinement transitions.” Nucl. Phys. B 210, 423 (1982).
- F. Wegner. “Disordered system with n orbitals per site: n = ∞ limit.” Phys. Rev. B 19, 783 (1979).