Translation Not Available
This page is not yet available in German. Showing English content instead.
Help us translate this page! Contribute on GitHub
Post-Pull Hooks#
Downloading a file is rarely the last step. A FASTA file needs to be indexed before BLAST can use it. A compressed archive needs to be decompressed. A GTF annotation needs to be converted. A downstream pipeline needs to be triggered.
Without hooks, you'd write a shell script that wraps bdp pull and then runs the preparation steps manually — and that script lives outside your manifest, outside version control, and breaks the moment someone on your team forgets to run it.
Post-pull hooks put that logic inside bdp.yml. They run automatically after every successful pull, in the same reproducible way, for everyone.
What to Version and What to Ignore#
The guiding principle: version the process, not the output.
Biological data files are large — often gigabytes. You don't commit them to git. You commit the specification of what to download and how to prepare it, and anyone with that specification can reproduce the exact same state from scratch.
bdp init handles this automatically. It writes a single entry to your .gitignore:
# BDP cache and runtime files .bdp/
That one line covers everything: downloaded source files, the audit database, and all hook-generated artifacts — as long as you write hook outputs into .bdp/. You never need to add *.fai, *.nhr, *.nsq, or any other derived-file patterns manually.
What you do commit:
bdp.yml— declares which sources to pull and which hooks to runbdp.lock— pins exact versions and checksums for every sourcescripts/— the hook scripts themselves
The data is large and changes on an upstream schedule. The specification is small and changes when you decide. Git tracks the specification; bdp pull restores everything else.
Convention: Write Hook Outputs to .bdp/#
Because .bdp/ is already gitignored, it is the natural home for anything generated by hooks. The convention is to use subdirectories that mirror what was built:
| Artifact type | Convention |
|---|---|
| BLAST databases | .bdp/blast/ |
Sequence indexes (.fai, .bwt, …) | .bdp/index/ |
| Decompressed files | .bdp/derived/ |
| Custom processed outputs | .bdp/derived/ |
This is a convention, not enforced by BDP — hooks run arbitrary shell commands and can write anywhere. But writing outside .bdp/ means you need to maintain extra .gitignore patterns yourself. Write inside .bdp/ and git ignores it for free.
A BLAST database built by a hook is an artifact of running bdp pull — you rebuild it on demand, you don't store it. The reproducibility guarantee — enforced by bdp.lock — means the result is always byte-for-byte identical.
A Quick Example#
You've added a UniProt protein sequence to your project. After pulling it, you want BLAST to index it immediately:
project: name: my-analysis version: 1.0.0 sources: - uniprot:P01308-fasta@1.0 hooks: post_pull: - mkdir -p .bdp/blast && makeblastdb -in "$BDP_UNIPROT_P01308_FASTA" -dbtype prot -out .bdp/blast/p01308
Run bdp pull once. The file is downloaded, verified, and the BLAST database is built — automatically, every time, for every member of your team. The database lives in .bdp/blast/ — gitignored, reproducible on demand.
Configuration#
Hooks are defined at the project level in bdp.yml under hooks.post_pull. They are a list of shell commands and run once after the entire pull is done — all sources downloaded, checksums verified, bdp.lock written.
hooks: post_pull: - python scripts/decompress.py - python scripts/build_index.py - nextflow run pipeline.nf
Commands run sequentially in order. If you have multiple sources and multiple preparation steps, list them all — BDP runs each one after the other.
When Hooks Run#
Post-pull hooks fire after:
- All sources have been downloaded
- All checksums have been verified
bdp.lockhas been written to disk
They do not run in dry-run mode (bdp pull --dry-run). If a source is already cached and up to date, the pull still runs hooks — the preparation step is always executed to ensure your working state is consistent.
Environment Variables#
Every hook runs with environment variables injected for each locked source, so you never have to hardcode paths:
BDP_PROJECT_ROOT — absolute path to the project directory (where bdp.yml lives)
BDP_<ORG>_<NAME>_<FORMAT> — absolute path to each locked source file on disk
The variable name is derived from the source spec: uppercase, non-alphanumeric characters replaced with underscores:
| Source spec | Environment variable |
|---|---|
uniprot:P01308-fasta@1.0 | BDP_UNIPROT_P01308_FASTA |
hpo:hp-obo@1.0 | BDP_HPO_HP_OBO |
ensembl:homo-sapiens-gtf@1.0 | BDP_ENSEMBL_HOMO_SAPIENS_GTF |
This means your hook scripts are portable — they don't need to know where BDP caches files, and they work correctly regardless of who is running the project or where.
Execution Environment#
- Shell:
sh -con Unix/Linux/macOS,cmd /Con Windows - Working directory: project root (where
bdp.ymlis located) - Output: streamed directly to the terminal — both stdout and stderr are visible
- Timeout: none — hooks can run as long as they need
Error Handling#
Hook failures are non-fatal. If a hook exits with a non-zero status, bdp pull prints a warning and moves on. The pull itself exits with code 0.
→ Running post-pull hooks...→ python scripts/build_index.py✓ python scripts/build_index.py→ bash scripts/validate.sh⚠ Hook 'bash scripts/validate.sh' exited with status 1All sources remain cached. Fix the script and re-run bdp pull — sources already in cache are not re-downloaded, only the hooks run again.
More Examples#
Build a BLAST database#
mkdir -p .bdp/blast && makeblastdb -in "$BDP_UNIPROT_P01308_FASTA" -dbtype prot -out .bdp/blast/p01308Decompress and index a FASTA#
gzip -dk "$BDP_UNIPROT_P01308_FASTA" -c > .bdp/derived/p01308.fasta && samtools faidx .bdp/derived/p01308.fastaRun a Python preparation script#
import os
fasta = os.environ["BDP_UNIPROT_P01308_FASTA"]
project = os.environ["BDP_PROJECT_ROOT"]
# Build a simple sequence index — write to .bdp/derived/ (already gitignored)
import pathlib
derived = pathlib.Path(project) / ".bdp" / "derived"
derived.mkdir(parents=True, exist_ok=True)
with open(fasta) as f:
headers = [line.strip() for line in f if line.startswith(">")]
with open(derived / "index.txt", "w") as out:
out.write("\n".join(headers))Trigger a Nextflow pipeline#
nextflow run pipeline.nf --fasta "$BDP_UNIPROT_P01308_FASTA" --outdir results/Audit Logging#
Every hook execution is recorded in the local audit log (.bdp/bdp.db):
- Command text
- Exit code
- Duration (milliseconds)
Run bdp audit to inspect the full execution history, including hook runs. This is useful for debugging flaky preparation steps and for compliance workflows that require a record of all data processing steps.