Documentation

Lockfiles#

bdp.lock is generated by bdp pull and records the exact resolved spec, file format, SHA-256 checksum, and size for every source in your project. Commit it. Anyone who clones your repository and runs bdp pull gets the exact same files.

What It Looks Like#

bdp.lock
{
"lockfile_version": 1,
"generated": "2026-03-29T14:32:01Z",
"sources": {
  "uniprot:P01308-fasta@1.0": {
    "resolved": "uniprot:P01308-fasta@1.0",
    "format": "fasta",
    "checksum": "b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9",
    "size": 4294,
    "external_version": "1.0"
  },
  "hpo:hp-obo@1.0": {
    "resolved": "hpo:hp-obo@1.0",
    "format": "obo",
    "checksum": "a3f2c1d9e84b7f2c0e61d3a5b9c8f7e2d4a1b6c3e9f0d2a8b7c4e1f3d9a5c2b",
    "size": 18874368,
    "external_version": "1.0"
  }
},
"tools": {}
}

The format is JSON. It is human-readable by design, but not intended to be hand-edited — bdp pull overwrites it on every run.

Fields#

Root#

FieldTypeDescription
lockfile_versionintegerSchema version. Currently 1.
generatedISO 8601 UTCTimestamp of the last bdp pull that wrote this file.
sourcesobjectMap of source spec → SourceEntry.
toolsobjectMap of tool spec → ToolEntry. Empty if no tools declared.

SourceEntry#

FieldDescription
resolvedThe full spec as resolved by the registry (e.g. uniprot:P01308-fasta@1.0).
formatFile format — determines the file extension in cache (fasta, gtf, obo, vcf, …).
checksumSHA-256 of the downloaded file, as 64 lowercase hex characters. No prefix.
sizeFile size in bytes.
external_versionVersion string from the upstream source. May differ from the BDP @version — see below.
dependency_countOptional. Number of associated sub-resources. Omitted when not set.

ToolEntry#

FieldDescription
resolvedFull resolved tool spec.
versionTool version string.
urlDownload URL.
checksumSHA-256, 64 lowercase hex characters.
sizeFile size in bytes.

BDP Version vs. External Version#

Every source in bdp.yml uses a BDP-assigned version — @1.0, @2.0, and so on. This is the version you pin. The external_version in bdp.lock is whatever the upstream organization uses for that release internally (a date string, a build number, a semantic version). They are often different:

Spec in bdp.ymlexternal_version in bdp.lock
uniprot:P01308-fasta@1.01.0
ensembl:homo-sapiens-gtf@1.0110.0

You interact with the BDP version. The external version is recorded in the lockfile for provenance — it appears in audit exports and Data Availability Statements.

The Lifecycle#

bdp.lock is only ever written by bdp pull. Nothing else touches it:

CommandWrites lockfileReads lockfile
bdp initNoNo
bdp source addNoNo
bdp pullYes — alwaysNo (writes fresh)
bdp statusNoYes
bdp generateNoYes
bdp cleanNoNo

bdp source add modifies only bdp.yml. The lockfile is not updated until you run bdp pull.

Fresh Clone Workflow#

The standard workflow for a new collaborator or a fresh CI environment:

bash
git clone <repo>
cd my-analysis
bdp pull

BDP reads bdp.yml to know which sources to fetch, resolves them against the registry, downloads each one, verifies its SHA-256 against the value in bdp.lock, and exits with an error if there is a mismatch. The result is byte-for-byte identical to what anyone else with the same lockfile gets.

If bdp.lock does not exist (e.g. it was never committed, or was deleted), bdp pull creates it from scratch.

Checksum Verification#

Every download is verified before the file is written to cache:

Code
downloaded bytes → SHA-256 → compare against lockfile checksum → error on mismatch

If verification fails, bdp pull aborts with a checksum mismatch error and does not write the file to cache. This catches corrupted downloads, truncated transfers, and files that changed upstream between pulls.

To re-download a file you suspect is corrupt on disk (the cache lookup is by existence, not checksum):

bash
bdp pull --force

--force skips the cache check and re-downloads all sources, verifying each against bdp.lock.

Merge Conflicts#

bdp.lock is a plain JSON file. Git does not know it is machine-generated. If two branches each add a different source and are merged, Git may produce a conflict.

The cleanest resolution is to accept one side and re-run bdp pull — it will produce a correct lockfile containing all sources from the merged bdp.yml:

bash
# After resolving bdp.yml conflict
git checkout --ours bdp.lock
bdp pull
git add bdp.lock
git commit

Manual editing of bdp.lock is safe as a last resort, but the file will be overwritten by the next bdp pull anyway — so the simplest fix is always to re-pull.

What to Commit#

Code
✓ bdp.yml — declares which sources your project uses
✓ bdp.lock — pins exact versions, checksums, sizes
✗ .bdp/ — cache, audit db, config — gitignored by bdp init

bdp init adds .bdp/ to .gitignore automatically. Never commit the cache.