nf-cmgg/germline pipeline parameters

A nextflow pipeline for calling and annotating small germline variants from short DNA reads for WES and WGS data

Input/output options

Define where the pipeline should find input data and save output data.

Parameter	Description	Type	Required
`input`	Path to comma-separated file containing information about the samples in the experiment. Help You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with samples, and a header row. See usage docs.	`string`	True
`outdir`	The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.	`string`	True
`watchdir`	A folder to watch for the creation of files that start with `watch:` in the samplesheet.	`string`
`email`	Email address for completion summary. Help Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.	`string`
`ped`	Path to a pedigree file for all samples in the run. All relational data will be fetched from this file.	`string`

Reference genome options

Reference genome related files and options required for the workflow.

Parameter	Description	Type	Default	Required	Hidden
`genome`	Reference genome build. Used to fetch the right reference files. Help Requires a Genome Reference Consortium reference ID (e.g. GRCh38)	`string`	GRCh38
`fasta`	Path to FASTA genome file. Help This parameter is mandatory if `--genome` is not specified. The path to the reference genome fasta.	`string`		True
`fai`	Path to FASTA genome index file.	`string`
`dict`	Path to the sequence dictionary generated from the FASTA reference. This is only used when `haplotypecaller` is one of the specified callers.	`string`
`strtablefile`	Path to the STR table file generated from the FASTA reference. This is only used when `--dragstr` has been given.	`string`
`sdf`	Path to the SDF folder generated from the reference FASTA file. This is only required when using `--validate`.	`string`
`elfasta`	Path to the ELFASTA genome file. This is used when `elprep` is part of the callers and will be automatically generated when missing.	`string`
`elsites`	Path to the elsites file. This is used when `elprep` is part of the callers.	`string`
`genomes`	Object for genomes	`object`			True
`genomes_base`	Directory base for CMGG reference store (used when `--genomes_ignore false` is specified)	`string`	/references/
`cmgg_config_base`	The base directory for the local config files	`string`	/conf/		True
`genomes_ignore`	Do not load the local references from the path specified with `--genomes_base`	`boolean`			True
`igenomes_base`	Directory / URL base for iGenomes references.	`string`			True
`igenomes_ignore`	Do not load the iGenomes reference config. Help Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`.	`boolean`			True

Pipeline specific parameters

Parameters that define how the pipeline works

Parameter	Description	Type	Default	Hidden
`scatter_count`	The amount of scattering that should happen per sample. Help Increase this number to increase the pipeline run speed, but at the tradeoff of using more IO and disk space. This can differ from the actual scatter count in some cases (especially with smaller files). This has an effect on HaplotypeCaller, GenomicsDBImport and GenotypeGVCFs.	`integer`	40
`merge_distance`	The merge distance for family BED files Help Increase this parameter if GenomicsDBImport is running slow. This defines the maximum distance between intervals that should be merged. The less intervals GenomicsDBImport actually gets, the faster it will run.	`integer`	100000
`dragstr`	Create DragSTR models to be used with HaplotypeCaller Help This currently is only able to run single-core per sample. Due to this, the process is very slow with only very small improvements to the analysis.	`boolean`
`validate`	Validate the found variants	`boolean`
`filter`	Filter the found variants.	`boolean`
`annotate`	Annotate the found variants using Ensembl VEP.	`boolean`
`add_ped`	Add PED INFO header lines to the final VCFs.	`boolean`
`gemini`	Create a Gemini databases from the final VCFs.	`boolean`
`mosdepth_slow`	Don't run mosdepth in fast-mode Help This is advised if you need exact coverage BED files as output.	`boolean`
`roi`	Path to the default ROI (regions of interest) BED file to be used for WES analysis. Help This will be used for all samples that do not have a specific ROI file supplied to them through the samplesheet. Don't supply an ROI file to run the analysis as WGS.	`string`
`dbsnp`	Path to the dbSNP VCF file. This will be used to set the variant IDs.	`string`
`dbsnp_tbi`	Path to the index of the dbSNP VCF file.	`string`
`somalier_sites`	Path to the VCF file with sites for Somalier to use.	`string`	https://github.com/brentp/somalier/files/3412456/sites.hg38.vcf.gz
`only_call`	Only call the variants without doing any post-processing.	`boolean`
`only_merge`	Only run the pipeline until the creation of the genomicsdbs and output them.	`boolean`
`output_genomicsdb`	Output the genomicsDB together with the joint-genotyped VCF.	`boolean`
`callers`	A comma delimited string of the available callers. Current options are: `haplotypecaller` and `vardict`.	`string`	haplotypecaller
`vardict_min_af`	The minimum allele frequency for VarDict when no `vardict_min_af` is supplied in the samplesheet.	`number`	0.1
`normalize`	Normalize the variant in the final VCFs.	`boolean`
`only_pass`	Filter out all variants that don't have the PASS filter for vardict. This only works when `--filter` is also given.	`boolean`
`keep_alt_contigs`	Keep all aditional contigs for calling instead of filtering them out before.	`boolean`
`updio`	Run UPDio analysis on the final VCFs.	`boolean`
`updio_common_cnvs`	A TSV file containing common CNVs to be used by UPDio.	`string`
`automap`	Run AutoMap analysis on the final VCFs.	`boolean`
`automap_repeats`	BED file with repeat regions in the genome. Help This file will be automatically generated for hg38/GRCh38 and hg19/GRCh37 when this parameter has not been given.	`string`
`automap_panel`	TXT file with gene panel regions to be used by AutoMap. Help By default the CMGG gene panel list will be used.	`string`
`automap_panel_name`	The panel name of the panel given with --automap_panel.	`string`	cmgg_bio
`hc_phasing`	Perform phasing with HaplotypeCaller.	`boolean`
`min_callable_coverage`	The lowest callable coverage to determine callable regions.	`integer`	5
`unique_out`	Don't change this value	`string`		True

Institutional config options

Parameters used to describe centralised config profiles. These should not be edited.

Parameter	Description	Type	Default	Hidden
`custom_config_version`	Git commit id for Institutional configs.	`string`	master	True
`custom_config_base`	Base directory for Institutional configs. Help If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.	`string`	https://raw.githubusercontent.com/nf-core/configs/master	True
`config_profile_name`	Institutional config name.	`string`		True
`config_profile_description`	Institutional config description.	`string`		True
`config_profile_contact`	Institutional config contact information.	`string`		True
`config_profile_url`	Institutional config URL link.	`string`		True

Generic options

Less common options for the pipeline, typically set in a config file.

Parameter	Description	Type	Default	Hidden
`version`	Display version and exit.	`boolean`
`publish_dir_mode`	Method used to save pipeline results to output directory. Help The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.	`string`	copy
`email_on_fail`	Email address for completion summary, only when pipeline fails. Help An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.	`string`		True
`plaintext_email`	Send plain-text email instead of HTML.	`boolean`		True
`max_multiqc_email_size`	File size limit when attaching MultiQC reports to summary emails.	`string`	25.MB	True
`monochrome_logs`	Do not use coloured log outputs.	`boolean`		True
`hook_url`	Incoming hook URL for messaging service Help Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.	`string`
`multiqc_title`	MultiQC report title. Printed as page header, used for filename if not otherwise specified.	`string`
`multiqc_config`	Custom config file to supply to MultiQC.	`string`
`multiqc_logo`	Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file	`string`
`multiqc_methods_description`	Custom MultiQC yaml file containing HTML including a methods description.	`string`
`validate_params`	Boolean whether to validate parameters against the schema at runtime	`boolean`	True	True
`pipelines_testdata_base_path`	Base URL or local path to location of pipeline test dataset files	`string`	https://raw.githubusercontent.com/nf-core/test-datasets/	True

Annotation parameters

Parameters to configure Ensembl VEP and VCFanno

Parameter	Description	Type	Default
`vep_chunk_size`	The amount of sites per split VCF as input to VEP.	`integer`	50000
`species`	The species of the samples. Help Must be lower case and have underscores as spaces.	`string`	homo_sapiens
`vep_merged`	Specify if the VEP cache is a merged cache.	`boolean`	True
`vep_cache`	The path to the VEP cache.	`string`
`vep_dbnsfp`	Use the dbNSFP plugin with Ensembl VEP. Help The '--dbnsfp' and '--dbnsfp_tbi' parameters need to be specified when using this parameter.	`boolean`
`vep_spliceai`	Use the SpliceAI plugin with Ensembl VEP. Help The '--spliceai_indel', '--spliceai_indel_tbi', '--spliceai_snv' and '--spliceai_snv_tbi' parameters need to be specified when using this parameter.	`boolean`
`vep_spliceregion`	Use the SpliceRegion plugin with Ensembl VEP.	`boolean`
`vep_mastermind`	Use the Mastermind plugin with Ensembl VEP. Help The '--mastermind' and '--mastermind_tbi' parameters need to be specified when using this parameter.	`boolean`
`vep_maxentscan`	Use the MaxEntScan plugin with Ensembl VEP. Help The '--maxentscan' parameter need to be specified when using this parameter.	`boolean`
`vep_eog`	Use the custom EOG annotation with Ensembl VEP. Help The '--eog' and '--eog_tbi' parameters need to be specified when using this parameter.	`boolean`
`vep_alphamissense`	Use the AlphaMissense plugin with Ensembl VEP. Help The '--alphamissense' and '--alphamissense_tbi' parameters need to be specified when using this parameter.	`boolean`
`vep_version`	The version of the VEP tool to be used.	`number`	105.0
`vep_cache_version`	The version of the VEP cache to be used.	`integer`	105
`dbnsfp`	Path to the dbSNFP file.	`string`
`dbnsfp_tbi`	Path to the index of the dbSNFP file.	`string`
`spliceai_indel`	Path to the VCF containing indels for spliceAI.	`string`
`spliceai_indel_tbi`	Path to the index of the VCF containing indels for spliceAI.	`string`
`spliceai_snv`	Path to the VCF containing SNVs for spliceAI.	`string`
`spliceai_snv_tbi`	Path to the index of the VCF containing SNVs for spliceAI.	`string`
`mastermind`	Path to the VCF for Mastermind.	`string`
`mastermind_tbi`	Path to the index of the VCF for Mastermind.	`string`
`alphamissense`	Path to the TSV for AlphaMissense.	`string`
`alphamissense_tbi`	Path to the index of the TSV for AlphaMissense.	`string`
`eog`	Path to the VCF containing EOG annotations.	`string`
`eog_tbi`	Path to the index of the VCF containing EOG annotations.	`string`
`vcfanno`	Run annotations with vcfanno.	`boolean`
`vcfanno_config`	The path to the VCFanno config TOML.	`string`
`vcfanno_lua`	The path to a Lua script to be used in VCFanno.	`string`
`vcfanno_resources`	A semicolon-seperated list of resource files for VCFanno, please also supply their indices using this parameter.	`string`