Snakemake Integration#
Use bdp generate snakemake to emit a YAML config block with resolved file paths, ready to include in your Snakemake workflow.
Generate Config#
bash
bdp generate snakemakeOutput (config/bdp_data.yaml):
yaml
# BDP data paths - auto-generated by 'bdp generate snakemake'. Do not edit.# Include in Snakefile: configfile: "config/bdp_data.yaml"bdp: clinvar_variants_vcf: ".bdp/data/sources/clinvar/variants/2024.01/variants_2024.01.vcf" ensembl_homo_sapiens_gtf: ".bdp/data/sources/ensembl/homo_sapiens/110/homo_sapiens_110.gtf" uniprot_p01308_fasta: ".bdp/data/sources/uniprot/P01308/2024.03/P01308_2024.03.fasta"Use in a Snakefile#
Include the generated config and reference paths via config["bdp"]:
python
# Snakefileconfigfile: "config/bdp_data.yaml" rule filter_pathogenic: input: vcf=config["bdp"]["clinvar_variants_vcf"], output: vcf="results/pathogenic.vcf", shell: 'bcftools view -i \'INFO/CLNSIG~"Pathogenic"\' {input.vcf} > {output.vcf}'Example: Variant Annotation Pipeline#
A 3-rule pipeline that filters ClinVar variants, intersects with gene annotations, and joins with disease association scores:
python
configfile: "config/bdp_data.yaml" rule all: input: "results/annotated_variants.tsv" rule filter_pathogenic: input: vcf=config["bdp"]["clinvar_variants_vcf"], output: vcf="results/pathogenic.vcf", shell: 'bcftools view -i \'INFO/CLNSIG~"Pathogenic"\' {input.vcf} > {output.vcf}' rule annotate_with_genes: input: vcf="results/pathogenic.vcf", gtf=config["bdp"]["ensembl_homo_sapiens_gtf"], output: tsv="results/variant_genes.tsv", shell: "bedtools intersect -a {input.vcf} -b {input.gtf} -wa -wb > {output.tsv}" rule join_with_associations: input: variants="results/variant_genes.tsv", associations=config["bdp"]["opentargets_associations_tsv"], output: tsv="results/annotated_variants.tsv", run: import pandas as pd vg = pd.read_csv(input.variants, sep="\t") assoc = pd.read_csv(input.associations, sep="\t") result = vg.merge(assoc[assoc["overallAssociationScore"] > 0.5], left_on="gene_name", right_on="targetId", how="inner") result.to_csv(output.tsv, sep="\t", index=False)Workflow#
- Add sources:
bdp source add clinvar:variants-vcf@2024.01 - Pull data:
bdp pull - Generate config:
bdp generate snakemake - Reference in Snakefile:
configfile: "config/bdp_data.yaml" - Run:
snakemake --cores 4 - Commit
bdp.yml,bdp.lock, andconfig/bdp_data.yaml
Reproducibility#
Because config/bdp_data.yaml is generated from bdp.lock, the exact same file paths are used every time — on your machine, your collaborator's machine, and CI.
Full Example#
See the complete Snakemake pipeline example on Codeberg — includes a 3-rule variant annotation workflow with bcftools, bedtools, and pandas.