Translation Not Available
This page is not yet available in German. Showing English content instead.
Help us translate this page! Contribute on GitHub
Cache Management#
BDP caches every downloaded file locally so that bdp pull is fast after the first run — already-present files are skipped without a network request. The cache is project-local by default and can be redirected to a shared network path for team use.
Where Files Are Stored#
The default cache directory is .bdp/data/ inside your project root — the same .bdp/ directory that bdp init creates and gitignores. Files are stored with a deterministic path:
.bdp/data/sources/{org}/{identifier}/{version}/{identifier}_{version}.{format}For example, uniprot:P01308-fasta@1.0 is cached at:
.bdp/data/sources/uniprot/P01308/1.0/P01308_1.0.fastaThis layout means every source has a stable, predictable location. You can reference files directly by path if needed, but using the BDP_* environment variables injected by hooks is more portable.
How Cache Lookup Works#
Before downloading, bdp pull checks whether the file already exists at its expected path. If it does, the download is skipped entirely — no checksum comparison, no network request. The lockfile is still updated with the source's metadata.
bdp pull# ✓ uniprot:P01308-fasta@1.0 (cached, skipped)# ↓ hpo:hp-obo@1.0 (downloading...)# ✓ hpo:hp-obo@1.0 (verified)To force a re-download of all sources regardless of cache state:
bdp pull --forceThis is useful if you suspect a file has been corrupted on disk. BDP will re-download and verify the checksum against bdp.lock.
Cache Commands#
# Show cache location, resolved path, and total disk usagebdp cache show # Change the cache directory (writes to .bdp/.config)bdp cache set /path/to/cache # Reset cache path back to the default (.bdp/data)bdp cache resetbdp cache show output:
Cache Configuration:Configured path: .bdp/dataResolved path: /home/user/project/.bdp/dataProject root: /home/user/projectCache size: 284 MBChecking What's Cached#
bdp status shows which sources from bdp.lock are present on disk and which are missing:
Locked Sources:✓ uniprot:P01308-fasta@1.0 [cached] 4.2 MB✓ hpo:hp-obo@1.0 [cached] 18.7 MB○ ensembl:homo-sapiens-gtf@1.0 [missing] Summary:2 cached · 1 missing · 22.9 MB on diskCache dir: /home/user/project/.bdp/data Run 'bdp pull' to download missing sources.Cleaning the Cache#
# Remove all cached source filesbdp clean --all --yes # Clear only the search results cache (separate database)bdp clean --search-cache --yesWithout --yes, BDP will prompt for confirmation. Without --all, it shows the current cache size and exits.
After cleaning, bdp pull re-downloads everything from scratch on the next run.
Cache Configuration#
Cache settings live in .bdp/.config — a TOML file inside your project's .bdp/ directory. bdp cache set writes to this file; you can also edit it directly.
[cache] path = ".bdp/data"
The path can be relative (resolved from the project root) or absolute. This is the only current cache configuration option.
Shared Team Cache#
If your team works on the same server or has a shared network filesystem, point everyone's cache to the same directory. Each member runs once and the rest reuse the result.
# Run this in each team member's project directorybdp cache set /mnt/shared/bdp-cacheThis writes the absolute path to .bdp/.config. Commit .bdp/.config to your repository so the shared path is automatically used by everyone who clones the project.
[cache] path = "/mnt/shared/bdp-cache"
Multiple projects can point to the same cache directory — files are stored by content-addressed path, so there is no collision between projects using the same source.
HPC and Cluster Usage#
On HPC clusters, set the cache to a shared scratch or project storage path so compute nodes don't each download their own copy:
#!/bin/bash #SBATCH --job-name=analysis #SBATCH --nodes=4 # Set shared cache before running — only one node will download bdp cache set /scratch/project/shared/bdp-cache bdp pull # All nodes now have access to the same cached files srun python scripts/analysis.py
If jobs run concurrently across nodes accessing the same cache directory, BDP writes are atomic at the file level — a file is either fully written or absent, never partially written. There is no distributed lock, so concurrent pulls of the same source may result in duplicate downloads, but not corruption.
Search Cache#
bdp search results are cached separately in a SQLite database at your OS cache directory (~/.cache/bdp/bdp.db on Linux/macOS). Entries expire after 5 minutes. This is independent of the data cache and is not project-specific.
# Clear stale search resultsbdp clean --search-cache --yesLockfile and Cache Relationship#
bdp.lock records the checksum and metadata for every pulled source, but it does not manage the cache directly. The cache is purely a filesystem store — BDP checks for file existence, not lockfile state, when deciding whether to skip a download.
This means:
- Deleting
.bdp/data/and re-runningbdp pullalways produces the same result — the lockfile guarantees the same versions and checksums are fetched. - Moving the cache (
bdp cache set) does not invalidate the lockfile — BDP will re-download to the new location on the nextbdp pull. - If you share a lockfile without sharing the cache, collaborators download the exact same files independently.