[bdp]registry
Reproducible bioinformatics data, for every team. Version reference genomes, annotation databases, and datasets like you version code.
Get your first dataset in 30 seconds
curl -sSfL https://bdp.dev/install.sh | sh# Install BDPbdp init# Initialize projectbdp source add uniprot:P01308-fasta@1.0# Add data sourcebdp pull# Download and cacheThen integrate in Nextflow/Snakemake, generate citations & data availability statements, or run post-pull scripts to process your data
Why BDP?
Like conda or pip, but for your data files — with reproducibility, provenance, and citations built in.
Complete Reproducibility
Lock files for data, not just code. Pin every reference genome, annotation database, and dataset to an exact version.
Seamless Collaboration
Share exact data environments across teams. Everyone runs the same files — no manual downloads, no version drift.
Smart Resource Management
Shared cache between team members. Download once, use everywhere. Save bandwidth and storage.
Automated Citation
Generate proper citations for every data source. Never miss a reference in your publications.
Integrity Verification
Automatic checksum validation detects tampering and corruption. Audit trails for compliance.
Unified Discovery
Search across hundreds of bioinformatics resources in one place. UniProt, NCBI, ChEBI, Ensembl, and more.
Workflow Integration
Native support for Nextflow, Snakemake, and CWL. Plug into your existing pipelines effortlessly.
How it works
Three commands. Reproducible data forever.
Search the registry
Find any dataset by name, organism, or identifier. Hundreds of curated sources — UniProt, NCBI, Ensembl, ChEBI, and more — indexed in one place.
Add to your project
Declare the source in your bdp.yml. Commit that file. Everyone on your team — and every future pipeline run — now resolves to the same version.
Pull and verify
Download and cache locally or to a shared team store. Checksums are verified automatically. Re-run anytime — already-cached files are skipped.
Built for the age of AI research
BDP ships a full Model Context Protocol server. AI agents can autonomously traverse gene → disease → phenotype → pathway → drug → literature — no hand-holding required.
24 MCP tools
Search, retrieve, and traverse the full biomedical knowledge graph
Full graph traversal
Gene → Disease → Phenotype → Pathway → Drug → Literature
MCP compatible
Works with Claude, any MCP client, and custom AI research agents
Built for everyone in the lab
Whether you're writing pipelines, publishing papers, or building AI research agents.
Researchers
Pin datasets to exact versions for your analysis. Auto-generate citations and data availability statements for every source you use.
Pipeline Engineers
Native integration with Nextflow, Snakemake, and CWL. Plug bdp pull into any workflow step — no custom download scripts.
AI Agents
24 MCP tools for autonomous data traversal. AI agents can search, retrieve, and reason over bioinformatics data without human intervention.