Don't demand reproducibility from researchers—demand it from their tools.
The Reproducibility Crisis
BDP is a dependency manager for biological databases—treating UniProt, NCBI, and other data sources like software packages with version control and lockfiles.
Only 11% of bioinformatics studies can be reproduced [1], with data versioning being a major contributing factor. The core problem isn't researchers—it's the tooling. Labs spend 4-12 hours per project on manual data management: writing download scripts, verifying checksums, coordinating versions with collaborators, and documenting data provenance for publications. Research shows workflow automation can save 30-75% of this time [2]. With BDP, these tasks take ~15 minutes.
[1] Leipzig, J. et al. (2021). The five pillars of computational reproducibility: bioinformatics and beyond. Briefings in Bioinformatics, 24(6).
[2] Perkel, J. M. (2015). Experiences with workflows for automating data-intensive bioinformatics. Biology Direct, 10(1).
Here are some workflow examples showing BDP in action (examples use Git for version control, which is recommended but not required—see Best Practices for details):
Example Workflow: Protein Analysis Project
A typical project analyzing insulin variants across species.
Step 1: Find the right data ~15-20 min → 5 sec
Before:
Browse UniProt website, search for "insulin", manually identify accession IDs, note down version and release date
With BDP:
bdp search "insulin homo sapiens"# uniprot:P01308-fasta@1.0 - http://localhost:3000/sources/uniprot/P01308Step 2: Download specific proteins ~30-45 min → 30 sec
Before:
Navigate UniProt FTP, find correct directory structure, write wget script, download files, verify manually
With BDP:
bdp source add uniprot:P01308-fasta@1.0bdp pullStep 3: Verify data integrity ~10-15 min → 2 sec
Before:
Download checksums separately, run shasum, compare manually, repeat if mismatch
With BDP:
bdp audit# ✓ All sources verifiedStep 4: Share with collaborators ~1-3 hours → 1 min
Before:
Upload files to shared server, send download link via email, explain which version/release, collaborator downloads, confirms they have the right version
With BDP:
# You: Commit bdp.yml and bdp.lock to git repositorygit add bdp.yml bdp.lockgit commit -m "Add insulin data sources"git push # Collaborator: Clone and retrieve data in one commandgit clone <repo>bdp pullStep 5: Six months later - reproduce the analysis ~2-6 hours → 10 sec
Before:
Searching through emails and chat history to reconstruct which database versions were used, checking whether files still exist on shared storage, finding broken download links, and attempting to locate archived versions from months ago
With BDP:
git checkout <commit-from-6-months-ago>bdp pull# Exact same data, guaranteedStep 6: Write the paper ~45-90 min → 5 sec
Before:
Manually write Data Availability Statement, look up correct citations for UniProt, format BibTeX entries, include version numbers and dates
With BDP:
bdp audit export --format das > data-availability.mdbdp cite --format bibtex > references.bib## Data Availability Statement Protein sequence data were obtained from UniProt release 2024_01 (accessed January 15, 2024) using bdp (Bioinformatics Dependencies Platform - http://localhost:3000). Specifically, we used human insulin precursor (UniProt ID: P01308) obtained via `bdp pull` with package identifier uniprot:P01308-fasta@1.0. All data sources are version-pinned in the project repository with cryptographic checksums to ensure reproducibility (https://github.com/lab/project).
@misc{uniprot_P01308_2024,
author = {{The UniProt Consortium}},
title = {UniProt: P01308 - Insulin (INS)},
year = {2024},
note = {UniProt Release 2024\_01, accessed January 15, 2024},
url = {https://www.uniprot.org/uniprotkb/P01308},
version = {2024\_01}
}Total time: ~4-12 hours → ~15 minutes
*Note: Time estimates are illustrative based on researcher interviews. We're actively collecting data on workflow inefficiencies. Have data or want to share your experience? Contact us or open a discussion.*Another Tool? We Know.
Tool fatigue is real. But this takes 30 seconds to try—see if it actually solves your problems.
View installation guide