Documentation

Nextflow Integration#

Use bdp generate nextflow to emit a Nextflow config file with resolved file paths as params.bdp.* variables, ready to include in your DSL2 pipeline.

Generate Config#

bash
bdp generate nextflow

Output (conf/bdp_data.config):

groovy
// BDP data paths - auto-generated by 'bdp generate nextflow'. Do not edit.
// Include in nextflow.config: includeConfig 'conf/bdp_data.config'
params {
bdp {
clinvar_variants_vcf = "${projectDir}/.bdp/data/sources/clinvar/variants/2024.01/variants_2024.01.vcf"
ensembl_homo_sapiens_gtf = "${projectDir}/.bdp/data/sources/ensembl/homo_sapiens/110/homo_sapiens_110.gtf"
uniprot_p01308_fasta = "${projectDir}/.bdp/data/sources/uniprot/P01308/2024.03/P01308_2024.03.fasta"
}
}

Include in nextflow.config#

groovy
// nextflow.config
includeConfig 'conf/bdp_data.config'

Example: Variant Annotation Pipeline (DSL2)#

A 3-process pipeline using BioContainers for reproducibility:

groovy
nextflow.enable.dsl = 2
process FILTER_PATHOGENIC {
container 'biocontainers/bcftools:1.19--h8b25389_1'
input:
path vcf
output:
path "pathogenic.vcf", emit: vcf
script:
"""
bcftools view -i 'INFO/CLNSIG~"Pathogenic"' ${vcf} > pathogenic.vcf
"""
}
process ANNOTATE_GENES {
container 'biocontainers/bedtools:2.31.1--hf5e1c6e_1'
input:
path vcf
path gtf
output:
path "variant_genes.tsv", emit: tsv
script:
"""
bedtools intersect -a ${vcf} -b ${gtf} -wa -wb > variant_genes.tsv
"""
}
workflow {
ch_vcf = Channel.fromPath(params.bdp.clinvar_variants_vcf, checkIfExists: true)
ch_gtf = Channel.fromPath(params.bdp.ensembl_homo_sapiens_gtf, checkIfExists: true)
FILTER_PATHOGENIC(ch_vcf)
ANNOTATE_GENES(FILTER_PATHOGENIC.out.vcf, ch_gtf)
}

Run:

bash
nextflow run main.nf

Workflow#

  1. Add sources: bdp source add clinvar:variants-vcf@2024.01
  2. Pull data: bdp pull
  3. Generate config: bdp generate nextflow
  4. Include in pipeline: includeConfig 'conf/bdp_data.config'
  5. Access via params.bdp.clinvar_variants_vcf etc.
  6. Commit bdp.yml, bdp.lock, and conf/bdp_data.config

DSL2 Patterns#

Access BDP data through channels:

groovy
workflow {
// Create channels from BDP-managed files
ch_vcf = Channel.fromPath(params.bdp.clinvar_variants_vcf, checkIfExists: true)
ch_gtf = Channel.fromPath(params.bdp.ensembl_homo_sapiens_gtf, checkIfExists: true)
ch_fasta = Channel.fromPath(params.bdp.uniprot_p01308_fasta, checkIfExists: true)
// Wire into processes
FILTER_PATHOGENIC(ch_vcf)
ANNOTATE_GENES(FILTER_PATHOGENIC.out.vcf, ch_gtf)
}

Uses checkIfExists: true to fail fast if bdp pull hasn't been run.

Full Example#

See the complete Nextflow pipeline example on Codeberg — includes a DSL2 variant annotation pipeline with BioContainers.