blueFish — Sovereign Data Pipeline

Sovereign ETL and data pipeline — NCBI integration, format conversion, no cloud lock-in.

Repository: sporeGarden/blueFish (moving from syntheticChemistry — repo pending)
License: scyBorg (AGPL-3.0-or-later + ORC + CC-BY-SA 4.0)


What It Is

blueFish is a sovereign data pipeline and ETL (Extract-Transform-Load) tool for scientific data. It handles NCBI database integration, format conversion between bioinformatics standards, and data ingestion for the primal ecosystem — all without sending data to external services.

For any lab working with sequence data, taxonomic databases, or clinical datasets, blueFish provides a local pipeline that respects data sovereignty: your data stays on your hardware, processed by auditable code, with full provenance tracking.


Key Capabilities

  • NCBI Integration: Direct access to NCBI databases (GenBank, SRA, Taxonomy) with local caching and incremental updates
  • Format Conversion: FASTA, FASTQ, SAM/BAM, VCF, GFF3, BED, and other bioinformatics formats
  • Provenance: Every transformation step is logged with BearDog-signed provenance via the RootPulse composition
  • Offline Operation: Once data is fetched, all processing runs locally — no network required
  • Pipeline Composition: Integrates with biomeOS Neural API for orchestrated multi-step pipelines

How It Composes

blueFish consumes primals for data integrity and orchestration:

PrimalWhat It Provides
NestGateContent-addressed storage for raw and processed datasets
BearDogCryptographic verification of data integrity
biomeOSPipeline orchestration via deploy graphs
💧🔬 wetSpringValidation of bioinformatics outputs against published methods

Why It Matters

Most bioinformatics pipelines are shell script chains: fragile, unreproducible, and tied to specific cluster configurations. blueFish replaces that with typed Rust pipelines that compose via JSON-RPC, run identically on a laptop and a cluster, and produce cryptographically signed outputs.

The combination of blueFish (data pipeline) + helixVision (structure prediction) + wetSpring (microbiology validation) creates a sovereign structural genomics stack that runs on consumer hardware.


See also: wetSpring for microbiology validation, Deployment Model for the BYOB workflow, Ecosystem Inventory for the full repository map.