Skip to content

nf-cmgg/germline pipeline parameters

A nextflow pipeline for calling and annotating small germline variants from short DNA reads for WES and WGS data

Input/output options

Define where the pipeline should find input data and save output data.

Parameter Description Type Default Required Hidden
input Path to comma-separated file containing information about the samples in the experiment.
HelpYou will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with samples, and a header row. See usage docs.
string True
outdir The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. string True
watchdir A folder to watch for the creation of files that start with watch: in the samplesheet. string
email Email address for completion summary.
HelpSet this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.
string
ped Path to a pedigree file for all samples in the run. All relational data will be fetched from this file. string

Reference genome options

Reference genome related files and options required for the workflow.

Parameter Description Type Default Required Hidden
genome Reference genome build. Used to fetch the right reference files.
HelpRequires a Genome Reference Consortium reference ID (e.g. GRCh38)
string GRCh38
fasta Path to FASTA genome file.
HelpThis parameter is mandatory if --genome is not specified. The path to the reference genome fasta.
string True
fai Path to FASTA genome index file. string
dict Path to the sequence dictionary generated from the FASTA reference. This is only used when haplotypecaller is one of the specified callers. string
strtablefile Path to the STR table file generated from the FASTA reference. This is only used when --dragstr has been given. string
sdf Path to the SDF folder generated from the reference FASTA file. This is only required when using --validate. string
elfasta Path to the ELFASTA genome file. This is used when elprep is part of the callers and will be automatically generated when missing. string
elsites Path to the elsites file. This is used when elprep is part of the callers. string
genomes Object for genomes object True
genomes_base Directory base for CMGG reference store (used when --genomes_ignore false is specified) string /references/
cmgg_config_base The base directory for the local config files string /conf/ True
genomes_ignore Do not load the local references from the path specified with --genomes_base boolean True
igenomes_base Directory / URL base for iGenomes references. string True
igenomes_ignore Do not load the iGenomes reference config.
HelpDo not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config.
boolean True

Pipeline specific parameters

Parameters that define how the pipeline works

Parameter Description Type Default Required Hidden
scatter_count The amount of scattering that should happen per sample.
HelpIncrease this number to increase the pipeline run speed, but at the tradeoff of using more IO and disk space. This can differ from the actual scatter count in some cases (especially with smaller files).
This has an effect on HaplotypeCaller, GenomicsDBImport and GenotypeGVCFs.
integer 40
merge_distance The merge distance for family BED files
HelpIncrease this parameter if GenomicsDBImport is running slow. This defines the maximum distance between intervals that should be merged. The less intervals GenomicsDBImport actually gets, the faster it will run.
integer 100000
dragstr Create DragSTR models to be used with HaplotypeCaller
HelpThis currently is only able to run single-core per sample. Due to this, the process is very slow with only very small improvements to the analysis.
boolean
validate Validate the found variants boolean
filter Filter the found variants. boolean
annotate Annotate the found variants using Ensembl VEP. boolean
add_ped Add PED INFO header lines to the final VCFs. boolean
gemini Create a Gemini databases from the final VCFs. boolean
mosdepth_slow Don't run mosdepth in fast-mode
HelpThis is advised if you need exact coverage BED files as output.
boolean
roi Path to the default ROI (regions of interest) BED file to be used for WES analysis.
HelpThis will be used for all samples that do not have a specific ROI file supplied to them through the samplesheet. Don't supply an ROI file to run the analysis as WGS.
string
dbsnp Path to the dbSNP VCF file. This will be used to set the variant IDs. string
dbsnp_tbi Path to the index of the dbSNP VCF file. string
somalier_sites Path to the VCF file with sites for Somalier to use. string https://github.com/brentp/somalier/files/3412456/sites.hg38.vcf.gz
only_call Only call the variants without doing any post-processing. boolean
only_merge Only run the pipeline until the creation of the genomicsdbs and output them. boolean
output_genomicsdb Output the genomicsDB together with the joint-genotyped VCF. boolean
callers A comma delimited string of the available callers. Current options are: haplotypecaller and vardict. string haplotypecaller
vardict_min_af The minimum allele frequency for VarDict when no vardict_min_af is supplied in the samplesheet. number 0.1
normalize Normalize the variant in the final VCFs. boolean
only_pass Filter out all variants that don't have the PASS filter for vardict. This only works when --filter is also given. boolean
keep_alt_contigs Keep all aditional contigs for calling instead of filtering them out before. boolean
updio Run UPDio analysis on the final VCFs. boolean
updio_common_cnvs A TSV file containing common CNVs to be used by UPDio. string
automap Run AutoMap analysis on the final VCFs. boolean
automap_repeats BED file with repeat regions in the genome.
HelpThis file will be automatically generated for hg38/GRCh38 and hg19/GRCh37 when this parameter has not been given.
string
automap_panel TXT file with gene panel regions to be used by AutoMap.
HelpBy default the CMGG gene panel list will be used.
string
automap_panel_name The panel name of the panel given with --automap_panel. string cmgg_bio
hc_phasing Perform phasing with HaplotypeCaller. boolean
min_callable_coverage The lowest callable coverage to determine callable regions. integer 5
unique_out Don't change this value string True

Institutional config options

Parameters used to describe centralised config profiles. These should not be edited.

Parameter Description Type Default Required Hidden
custom_config_version Git commit id for Institutional configs. string master True
custom_config_base Base directory for Institutional configs.
HelpIf you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.
string https://raw.githubusercontent.com/nf-core/configs/master True
config_profile_name Institutional config name. string True
config_profile_description Institutional config description. string True
config_profile_contact Institutional config contact information. string True
config_profile_url Institutional config URL link. string True

Generic options

Less common options for the pipeline, typically set in a config file.

Parameter Description Type Default Required Hidden
version Display version and exit. boolean
publish_dir_mode Method used to save pipeline results to output directory.
HelpThe Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
string copy
email_on_fail Email address for completion summary, only when pipeline fails.
HelpAn email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
string True
plaintext_email Send plain-text email instead of HTML. boolean True
max_multiqc_email_size File size limit when attaching MultiQC reports to summary emails. string 25.MB True
monochrome_logs Do not use coloured log outputs. boolean True
hook_url Incoming hook URL for messaging service
HelpIncoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
string
multiqc_title MultiQC report title. Printed as page header, used for filename if not otherwise specified. string
multiqc_config Custom config file to supply to MultiQC. string
multiqc_logo Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file string
multiqc_methods_description Custom MultiQC yaml file containing HTML including a methods description. string
validate_params Boolean whether to validate parameters against the schema at runtime boolean True True
pipelines_testdata_base_path Base URL or local path to location of pipeline test dataset files string https://raw.githubusercontent.com/nf-core/test-datasets/ True

Annotation parameters

Parameters to configure Ensembl VEP and VCFanno

Parameter Description Type Default Required Hidden
vep_chunk_size The amount of sites per split VCF as input to VEP. integer 50000
species The species of the samples.
HelpMust be lower case and have underscores as spaces.
string homo_sapiens
vep_merged Specify if the VEP cache is a merged cache. boolean True
vep_cache The path to the VEP cache. string
vep_dbnsfp Use the dbNSFP plugin with Ensembl VEP.
HelpThe '--dbnsfp' and '--dbnsfp_tbi' parameters need to be specified when using this parameter.
boolean
vep_spliceai Use the SpliceAI plugin with Ensembl VEP.
HelpThe '--spliceai_indel', '--spliceai_indel_tbi', '--spliceai_snv' and '--spliceai_snv_tbi' parameters need to be specified when using this parameter.
boolean
vep_spliceregion Use the SpliceRegion plugin with Ensembl VEP. boolean
vep_mastermind Use the Mastermind plugin with Ensembl VEP.
HelpThe '--mastermind' and '--mastermind_tbi' parameters need to be specified when using this parameter.
boolean
vep_maxentscan Use the MaxEntScan plugin with Ensembl VEP.
HelpThe '--maxentscan' parameter need to be specified when using this parameter.
boolean
vep_eog Use the custom EOG annotation with Ensembl VEP.
HelpThe '--eog' and '--eog_tbi' parameters need to be specified when using this parameter.
boolean
vep_alphamissense Use the AlphaMissense plugin with Ensembl VEP.
HelpThe '--alphamissense' and '--alphamissense_tbi' parameters need to be specified when using this parameter.
boolean
vep_version The version of the VEP tool to be used. number 105.0
vep_cache_version The version of the VEP cache to be used. integer 105
dbnsfp Path to the dbSNFP file. string
dbnsfp_tbi Path to the index of the dbSNFP file. string
spliceai_indel Path to the VCF containing indels for spliceAI. string
spliceai_indel_tbi Path to the index of the VCF containing indels for spliceAI. string
spliceai_snv Path to the VCF containing SNVs for spliceAI. string
spliceai_snv_tbi Path to the index of the VCF containing SNVs for spliceAI. string
mastermind Path to the VCF for Mastermind. string
mastermind_tbi Path to the index of the VCF for Mastermind. string
alphamissense Path to the TSV for AlphaMissense. string
alphamissense_tbi Path to the index of the TSV for AlphaMissense. string
eog Path to the VCF containing EOG annotations. string
eog_tbi Path to the index of the VCF containing EOG annotations. string
vcfanno Run annotations with vcfanno. boolean
vcfanno_config The path to the VCFanno config TOML. string
vcfanno_lua The path to a Lua script to be used in VCFanno. string
vcfanno_resources A semicolon-seperated list of resource files for VCFanno, please also supply their indices using this parameter. string