Dokumentation

Mit KI übersetzt — wir entschuldigen uns für etwaige Fehler. Helfen Sie uns, diese Übersetzung zu verbessern.

CWL Integration#

Use bdp generate cwl to produce a CWL v1.2 inputs file with typed class: File entries and EDAM ontology format annotations, ready to use with any CWL runner.

Generate Inputs#

bash
bdp generate cwl

Output (cwl/bdp-inputs.yml):

yaml
# BDP data inputs - auto-generated by 'bdp generate cwl'. Do not edit.
# Usage: cwltool my-workflow.cwl cwl/bdp-inputs.yml
# CWL v1.2 - https://www.commonwl.org/v1.2/
clinvar_variants_vcf:
class: File
format: edam:format_3016
path: ../.bdp/data/sources/clinvar/variants/2024.01/variants_2024.01.vcf
ensembl_homo_sapiens_gtf:
class: File
format: edam:format_2306
path: ../.bdp/data/sources/ensembl/homo_sapiens/110/homo_sapiens_110.gtf
uniprot_p01308_fasta:
class: File
format: edam:format_1929
path: ../.bdp/data/sources/uniprot/P01308/2024.03/P01308_2024.03.fasta

Every entry includes:

  • class: File — standard CWL file type
  • format: — EDAM ontology URI for known formats (omitted for unknown formats)
  • path: — relative path to the BDP cache

EDAM Format Mappings#

BDP automatically maps file formats to EDAM ontology URIs:

BDP FormatEDAM URIDescription
fastaedam:format_1929FASTA sequence format
vcfedam:format_3016VCF variant call format
gtfedam:format_2306GTF gene transfer format
gffedam:format_2305GFF general feature format
oboedam:format_2549OBO ontology format
tsvedam:format_3475Tab-separated values
csvedam:format_3752Comma-separated values
jsonedam:format_3464JSON format

Example: Variant Analysis Workflow#

Define CWL tools as separate files (best practice), then compose in a workflow:

Filter tool (filter-pathogenic.cwl):

yaml
cwlVersion: v1.2
class: CommandLineTool
label: Filter pathogenic variants from ClinVar VCF
requirements:
DockerRequirement:
dockerPull: biocontainers/bcftools:1.19--h8b25389_1
baseCommand: [bcftools, view]
arguments:
- prefix: -i
valueFrom: 'INFO/CLNSIG~"Pathogenic"'
inputs:
vcf:
type: File
format: edam:format_3016
inputBinding:
position: 1
stdout: pathogenic.vcf
outputs:
filtered_vcf:
type: stdout
format: edam:format_3016
$namespaces:
edam: http://edamontology.org/

Workflow (variant-analysis.cwl):

yaml
cwlVersion: v1.2
class: Workflow
label: Gene-disease variant analysis
inputs:
clinvar_variants_vcf:
type: File
format: edam:format_3016
ensembl_homo_sapiens_gtf:
type: File
format: edam:format_2306
steps:
filter:
run: filter-pathogenic.cwl
in:
vcf: clinvar_variants_vcf
out: [filtered_vcf]
annotate:
run: annotate-genes.cwl
in:
vcf: filter/filtered_vcf
gtf: ensembl_homo_sapiens_gtf
out: [variant_genes]
outputs:
annotated_variants:
type: File
outputSource: annotate/variant_genes
$namespaces:
edam: http://edamontology.org/

Run with BDP inputs:

bash
cwltool variant-analysis.cwl cwl/bdp-inputs.yml

The input names in bdp-inputs.yml match the workflow input names directly — no mapping needed.

Workflow#

  1. Add sources: bdp source add clinvar:variants-vcf@2024.01
  2. Pull data: bdp pull
  3. Generate inputs: bdp generate cwl
  4. Write your CWL tool definitions and workflow
  5. Run: cwltool my-workflow.cwl cwl/bdp-inputs.yml
  6. Commit bdp.yml, bdp.lock, and cwl/bdp-inputs.yml

CWL Runners#

BDP-generated inputs work with any CWL v1.2-compatible runner:

  • cwltool — reference implementation
  • Toil — scalable HPC/cloud runner
  • Arvados — cloud-scale platform with native CWL support
  • CWL-Airflow — Apache Airflow integration

Full Example#

See the complete CWL workflow example on Codeberg — includes bdp-inputs.yml (generated), variant-analysis.cwl (workflow), and separate tool definitions.