Reproduce It Yourself

Stand up a 13-primal NUCLEUS composition on your own hardware and run the same validated science workloads. No cloud. No institutional access.

Everything in the Lab ran on a single machine. You can reproduce it on yours. The composition deploys the same way on any x86_64 Linux with at least 16 GB RAM.


Prerequisites

  • Linux (tested on Pop!_OS 22.04 / Ubuntu 22.04)
  • Rust toolchain (rustup — installs in 2 minutes)
  • 16 GB RAM minimum (96 GB recommended for full NCBI data)
  • Git and basic build tools (build-essential)

Optional:

  • Vulkan-capable GPU for barraCuda GPU workloads
  • Python 3.10+ and R 4.x for baseline comparison pipelines

Step 1: Get the Primal Binaries

# Clone plasmidBin (pre-built binaries)
git clone [email protected]:ecoPrimals/plasmidBin.git
export PLASMIDBIN="$(pwd)/plasmidBin"

# Or build from source (springs are all public AGPL)
git clone https://github.com/syntheticChemistry/wetSpring
cd wetSpring && cargo build --release --workspace

Step 2: Deploy the Composition

git clone [email protected]:sporeGarden/projectNUCLEUS.git
cd projectNUCLEUS/deploy

# Deploy full NUCLEUS (13 primals) to the current machine
bash deploy.sh --composition full --gate mygate

# Verify all primals are healthy
bash deploy.sh --health-check

deploy.sh handles seed creation, primal startup ordering, health verification, and port allocation. Primals bind to 127.0.0.1 by default.


Step 3: Run the Science Workloads

# Run a single workload
toadstool execute ../workloads/wetspring/wetspring-16s-rust-validation.toml

# Run the full provenance pipeline (all workloads + DAG + ledger + braid)
bash provenance_pipeline.sh \
    --workloads-dir ../workloads/wetspring \
    --session-name "my-validation-run"

Expected Results

WorkloadChecksDomain
16S Pipeline37/37DADA2, chimera, taxonomy, UniFrac
Diversity Indices27/27Alpha/beta diversity, PCoA
Gonzales CPU Parity43/43PK, dose-response, Anderson spectral
Algae 16S (real data)34/34Full 16S on 11.9M NCBI reads
R Industry Parity53/53vegan, DADA2, phyloseq gold standards
Real NCBI Pipeline25/25Sovereign diversity + Anderson
Fajgenbaum Pathway8/8Immunology, drug repurposing
Cold Seep Pipeline8/8Metagenomics, QS gene catalog

Total: 235+ checks, all at tol=0.000000 (exact Python→Rust parity).


Step 4: Verify the Provenance Chain

After the pipeline completes, you have:

# Check the Merkle root (content hash of all DAG events)
cat results/PROVENANCE_MANIFEST.md | grep "Merkle Root"

# Query the loamSpine ledger
curl -s -X POST http://localhost:9700 \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","method":"spine.list","params":{},"id":1}'

# Query the sweetGrass braid
curl -s -X POST http://localhost:9850/jsonrpc \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","method":"braid.list","params":{},"id":1}'

The braid carries an ed25519 witness signature from BearDog’s key hierarchy. The Merkle root covers all data registrations and workload results in one integrity proof. Tamper with one byte and the chain breaks.


Step 5: Compare Your Results

If your workload output BLAKE3 hashes match the published hashes, the science is bit-for-bit reproduced. The Merkle root and braid URN will differ (expected — they include run-specific session IDs and timestamps), but the per-workload output hashes should be identical.

# Hash a workload output
b3sum results/wetspring-16s-rust-validation.stdout

What If Something Doesn’t Match?

File an issue or send a gap report. That’s the point — the methodology is falsifiable. If the results diverge on your hardware, that’s signal, not failure. Document the divergence, the hardware, and the environment. The gap report flows upstream through wateringHole and improves the ecosystem for everyone.


Data Dependencies

For the full NCBI pipeline (real data, not synthetic):

# Download real NCBI data (requires ~5 GB disk)
# PRJNA488170: Nannochloropsis outdoor 16S (Wageningen)
prefetch SRR7760408 && fasterq-dump SRR7760408

All synthetic workloads run without external data downloads.


Hardware Baselines

Hardware16S PipelineFull SuiteNotes
i9-14900K / 96 GB / RTX 4070<1s~30sreference node
Ryzen 5800X / 64 GB / RTX 3070<1s~45sswiftGate
Celeron J3455 / 8 GB / none~3s~5mNUC (CPU only)

Your times will vary. The checks should not.