CWL Integration#
Use bdp generate cwl to produce a CWL v1.2 inputs file with typed class: File entries and EDAM ontology format annotations, ready to use with any CWL runner.
Generate Inputs#
bash
bdp generate cwlOutput (cwl/bdp-inputs.yml):
yaml
# BDP data inputs - auto-generated by 'bdp generate cwl'. Do not edit.# Usage: cwltool my-workflow.cwl cwl/bdp-inputs.yml# CWL v1.2 - https://www.commonwl.org/v1.2/ clinvar_variants_vcf: class: File format: edam:format_3016 path: ../.bdp/data/sources/clinvar/variants/2024.01/variants_2024.01.vcf ensembl_homo_sapiens_gtf: class: File format: edam:format_2306 path: ../.bdp/data/sources/ensembl/homo_sapiens/110/homo_sapiens_110.gtf uniprot_p01308_fasta: class: File format: edam:format_1929 path: ../.bdp/data/sources/uniprot/P01308/2024.03/P01308_2024.03.fastaEvery entry includes:
class: File— standard CWL file typeformat:— EDAM ontology URI for known formats (omitted for unknown formats)path:— relative path to the BDP cache
EDAM Format Mappings#
BDP automatically maps file formats to EDAM ontology URIs:
| BDP Format | EDAM URI | Description |
|---|---|---|
fasta | edam:format_1929 | FASTA sequence format |
vcf | edam:format_3016 | VCF variant call format |
gtf | edam:format_2306 | GTF gene transfer format |
gff | edam:format_2305 | GFF general feature format |
obo | edam:format_2549 | OBO ontology format |
tsv | edam:format_3475 | Tab-separated values |
csv | edam:format_3752 | Comma-separated values |
json | edam:format_3464 | JSON format |
Example: Variant Analysis Workflow#
Define CWL tools as separate files (best practice), then compose in a workflow:
Filter tool (filter-pathogenic.cwl):
yaml
cwlVersion: v1.2class: CommandLineToollabel: Filter pathogenic variants from ClinVar VCF requirements: DockerRequirement: dockerPull: biocontainers/bcftools:1.19--h8b25389_1 baseCommand: [bcftools, view]arguments: - prefix: -i valueFrom: 'INFO/CLNSIG~"Pathogenic"' inputs: vcf: type: File format: edam:format_3016 inputBinding: position: 1 stdout: pathogenic.vcf outputs: filtered_vcf: type: stdout format: edam:format_3016 $namespaces: edam: http://edamontology.org/Workflow (variant-analysis.cwl):
yaml
cwlVersion: v1.2class: Workflowlabel: Gene-disease variant analysis inputs: clinvar_variants_vcf: type: File format: edam:format_3016 ensembl_homo_sapiens_gtf: type: File format: edam:format_2306 steps: filter: run: filter-pathogenic.cwl in: vcf: clinvar_variants_vcf out: [filtered_vcf] annotate: run: annotate-genes.cwl in: vcf: filter/filtered_vcf gtf: ensembl_homo_sapiens_gtf out: [variant_genes] outputs: annotated_variants: type: File outputSource: annotate/variant_genes $namespaces: edam: http://edamontology.org/Run with BDP inputs:
bash
cwltool variant-analysis.cwl cwl/bdp-inputs.ymlThe input names in bdp-inputs.yml match the workflow input names directly — no mapping needed.
Workflow#
- Add sources:
bdp source add clinvar:variants-vcf@2024.01 - Pull data:
bdp pull - Generate inputs:
bdp generate cwl - Write your CWL tool definitions and workflow
- Run:
cwltool my-workflow.cwl cwl/bdp-inputs.yml - Commit
bdp.yml,bdp.lock, andcwl/bdp-inputs.yml
CWL Runners#
BDP-generated inputs work with any CWL v1.2-compatible runner:
- cwltool — reference implementation
- Toil — scalable HPC/cloud runner
- Arvados — cloud-scale platform with native CWL support
- CWL-Airflow — Apache Airflow integration
Full Example#
See the complete CWL workflow example on Codeberg — includes bdp-inputs.yml (generated), variant-analysis.cwl (workflow), and separate tool definitions.