[bdp]registry

Reproducible bioinformatics data, for every team. Version reference genomes, annotation databases, and datasets like you version code.

- Data Sources
- Organizations

Get your first dataset in 30 seconds

$
curl -sSfL https://bdp.dev/install.sh | sh# Install BDP
$
bdp init# Initialize project
$
bdp source add uniprot:P01308-fasta@1.0# Add data source
$
bdp pull# Download and cache

Then integrate in Nextflow/Snakemake, generate citations & data availability statements, or run post-pull scripts to process your data

Why BDP?

Like conda or pip, but for your data files — with reproducibility, provenance, and citations built in.

Complete Reproducibility

Lock files for data, not just code. Pin every reference genome, annotation database, and dataset to an exact version.

Seamless Collaboration

Share exact data environments across teams. Everyone runs the same files — no manual downloads, no version drift.

Smart Resource Management

Shared cache between team members. Download once, use everywhere. Save bandwidth and storage.

Automated Citation

Generate proper citations for every data source. Never miss a reference in your publications.

Integrity Verification

Automatic checksum validation detects tampering and corruption. Audit trails for compliance.

Unified Discovery

Search across hundreds of bioinformatics resources in one place. UniProt, NCBI, ChEBI, Ensembl, and more.

Workflow Integration

Native support for Nextflow, Snakemake, and CWL. Plug into your existing pipelines effortlessly.

Open Source

Fully transparent and auditable codebase. Review, audit, and contribute. Built in the open for the community.

How it works

Three commands. Reproducible data forever.

1

Search the registry

Find any dataset by name, organism, or identifier. Hundreds of curated sources — UniProt, NCBI, Ensembl, ChEBI, and more — indexed in one place.

2

Add to your project

Declare the source in your bdp.yml. Commit that file. Everyone on your team — and every future pipeline run — now resolves to the same version.

3

Pull and verify

Download and cache locally or to a shared team store. Checksums are verified automatically. Re-run anytime — already-cached files are skipped.

MCP Server

Built for the age of AI research

BDP ships a full Model Context Protocol server. AI agents can autonomously traverse gene → disease → phenotype → pathway → drug → literature — no hand-holding required.

24 MCP tools

Search, retrieve, and traverse the full biomedical knowledge graph

Full graph traversal

Gene → Disease → Phenotype → Pathway → Drug → Literature

MCP compatible

Works with Claude, any MCP client, and custom AI research agents

Built for everyone in the lab

Whether you're writing pipelines, publishing papers, or building AI research agents.

Researchers

Pin datasets to exact versions for your analysis. Auto-generate citations and data availability statements for every source you use.

Pipeline Engineers

Native integration with Nextflow, Snakemake, and CWL. Plug bdp pull into any workflow step — no custom download scripts.

AI Agents

24 MCP tools for autonomous data traversal. AI agents can search, retrieve, and reason over bioinformatics data without human intervention.

Start in 30 seconds

Install BDP, add a data source, and pull your first dataset.