External Collaboration Model
How sovereign infrastructure enables external science production — the gen5 collaborator gate pattern.
The gen5 Pattern
gen5 asks: does someone else’s science come out the other end? The external collaboration model makes this architecturally possible — collaborators use the same infrastructure patterns as internal gates, scoped to their domain.
Sovereignty Enables Collaboration
Sovereignty is not isolation. The ecosystem’s self-hosted infrastructure (Forgejo, WaterFall sync, sovereign DNS) provides the substrate for external collaboration without vendor lock-in:
- Collaborators sync through the same pipelines as internal gates
- Data stays on sovereign infrastructure (no cloud vendor ingestion)
- Provenance is tracked end-to-end (every computational step attributable)
- The collaborator owns their output (pseudoSpore as delivery format)
The Collaborator Gate Model
External collaborators get a gate profile in the ecosystem manifest, scoped to repos relevant to their domain:
[gates.gonzales_nf]
repos = [
"wateringHole",
"helixVision", "initioChem", "blueFish",
"wetSpring", "hotSpring",
"projectFOUNDATION",
]They pull only what they need. They never see NUCLEUS internals, unrelated springs, or infrastructure repos. The WaterFall pipeline handles scoping automatically.
What the Collaborator Brings
- A biological question the ecosystem hasn’t answered
- Domain data access (NF Data Portal, LTEE datasets, analytical standards)
- Domain expertise (signaling biology, microbial evolution, analytical chemistry)
- Institutional authority (PI status for grants, publication lead)
What the Ecosystem Provides
- Validated computation — 12,510+ checks, 70+ papers reproduced across 8 domains
- Multi-product composition — orchestrated products for the collaborator’s question
- GPU compute at zero cost to the institution
- Self-verifying artifacts — pseudoSpore packaging with full provenance
- AI-accelerated coordination — metadata extraction, cross-checking, assembly
What Comes Out
- A pseudoSpore — self-verifying data package the collaborator owns
- Grant preliminary data — foundation-ready computational evidence
- New spring validation targets — the ecosystem grows from collaborator science
- A reproducibility record — every step provenance-tracked
Multi-Product Composition (gen5 novelty)
gen4 products each composed primals independently. gen5 demands products compose with each other — driven by the biological question, not internal design.
Example — neurofibromatosis data mining requires:
- helixVision for gene expression mining from NF Data Portal
- healthSpring for drug repurposing scoring against NF targets
- initioChem for conformational dynamics of inhibitor binding
- coralForge for structural variant impact prediction
No single product answers the question. The answer emerges from their composition — orchestrated by the science itself.
The Spore Cycle
The collaboration completes a biological cycle:
Ecosystem validates published science (springs)
→ Products compose validated computation
→ Collaborator produces new science
→ New science → new validation targets for springs
→ Springs evolve from external demand
→ Ecosystem is stronger than beforeCurrent Collaborators
| Collaborator | Domain | Products | Status |
|---|---|---|---|
| Gonzales (NF) | Neurofibromatosis data mining | helixVision + healthSpring + initioChem | Engaged |
| ABG (Alistaire) | CAZyme conformational FEL | initioChem | Producing |
| Jones (PFAS) | Analytical chemistry ETL | blueFish | Active consulting |
| Barrick (LTEE) | Microbial evolution | lithoSpore | Contacted |
Scientific Challenges
Beyond individual collaborators, the ecosystem participates in structured benchmarks hosted by scientific foundations (Synapse/DREAM challenges):
- Docker-based submission maps to pseudoSpore pattern
- Foundation-sponsored evaluation builds credibility with funders
- Domain-expert scoring validates products under external criteria
- Challenge results reveal gaps internal testing never surfaces
This is gen5 validation at population scale — not one collaborator’s question, but the entire field’s benchmarks.