Python Integration#
Use bdp generate python to get a Python module with pathlib.Path variables for every source in your bdp.lock, ready to import in any script or notebook.
Generate Paths#
bash
bdp generate pythonOutput (bdp_data.py):
python
"""BDP data paths - auto-generated by 'bdp generate python'. Do not edit."""from pathlib import Path _BASE = Path(__file__).parent # Source: uniprot:P01308-fasta@2024.03UNIPROT_P01308_FASTA = _BASE / ".bdp" / "data" / "sources" / "uniprot" / "P01308" / "2024.03" / "P01308_2024.03.fasta" # Source: clinvar:variants-vcf@2024.01CLINVAR_VARIANTS_VCF = _BASE / ".bdp" / "data" / "sources" / "clinvar" / "variants" / "2024.01" / "variants_2024.01.vcf"Use in a Script#
python
from bdp_data import UNIPROT_P01308_FASTA, CLINVAR_VARIANTS_VCF from Bio import SeqIOrecords = list(SeqIO.parse(str(UNIPROT_P01308_FASTA), "fasta"))Example: Gene-Disease Variant Analysis#
python
from bdp_data import CLINVAR_VARIANTS_VCF, OPENTARGETS_ASSOCIATIONS_TSV from cyvcf2 import VCFimport pandas as pd # Filter pathogenic ClinVar variantsvariants = []for v in VCF(str(CLINVAR_VARIANTS_VCF)): clnsig = v.INFO.get("CLNSIG", "") gene = v.INFO.get("GENEINFO", "") if "Pathogenic" in str(clnsig) and gene: variants.append({ "gene": gene.split(":")[0], "chrom": v.CHROM, "pos": v.POS, "disease": v.INFO.get("CLNDN", "not_provided"), })pathogenic = pd.DataFrame(variants) # Join with Open Targets association scoresassociations = pd.read_csv( OPENTARGETS_ASSOCIATIONS_TSV, sep="\t", usecols=["targetId", "diseaseId", "overallAssociationScore"],)result = pathogenic.merge( associations[associations["overallAssociationScore"] > 0.5], left_on="gene", right_on="targetId", how="inner",)Use in a Notebook#
python
# Jupyter / Quarto notebookfrom bdp_data import UNIPROT_P01308_FASTA, ENSEMBL_HOMO_SAPIENS_GTF from Bio import SeqIOproteins = {r.id: len(r.seq) for r in SeqIO.parse(str(UNIPROT_P01308_FASTA), "fasta")}Workflow#
- Add sources:
bdp source add uniprot:P01308-fasta@2024.03 - Pull data:
bdp pull - Generate paths:
bdp generate python - Import in your code:
from bdp_data import UNIPROT_P01308_FASTA - Commit
bdp.yml,bdp.lock, andbdp_data.py
Regenerating#
Re-run bdp generate python after any bdp pull that changes versions. Add it as a post-pull hook to automate this:
yaml
# bdp.ymlhooks: post_pull: - bdp generate pythonFull Example#
See the complete Python variant analysis example on Codeberg — includes bdp_data.py (generated) and analysis.py (BioPython + cyvcf2 + pandas).