nf-cmgg/germline: Output
Introduction
This page describes the output produced by the pipeline.
The output directory has been structured in such a way that you can pass the same output directory to it for each pipeline run. The pipeline will add the files to that directory in a traceable way without overwriting already existing files. This makes it easy to store data, coming from multiple sequencing runs, in the same root directory.
To explain the structure of the output directory, a simple example run consisting of two families is used. The first family (family1
) is a family consisting of a trio (son, father and mother) and the second family (family2
) consists of a single sample.
<outdir> #(1)!
├── family1 #(2)!
│ ├── output_<pipeline_version>_<date> #(3)!
│ │ ├── automap #(4)!
│ │ │ └── <caller> #(5)!
│ │ │ ├── sample1 #(6)!
│ │ │ │ ├── sample1.HomRegions.<panel>.tsv
│ │ │ │ ├── sample1.HomRegions.pdf
│ │ │ │ ├── sample1.HomRegions.strict.<panel>.tsv
│ │ │ │ └── sample1.HomRegions.tsv
│ │ │ ├── sample2
│ │ │ └── sample3
│ │ ├── family1.<caller>.bed #(7)!
│ │ ├── family1.<caller>.db #(8)!
│ │ ├── family1.<caller>.ped #(9)!
│ │ ├── family1.<caller>.vcf.gz #(10)!
│ │ └── family1.<caller>.vcf.gz.tbi #(11)!
│ ├── qc_<pipeline_version>_<date> #(12)!
│ │ ├── family1.<caller>.bcftools_stats.txt #(13)!
│ │ └── family1.<caller>.html #(14)!
│ ├── sample1_<pipeline_version>_<date> #(15)!
│ │ ├── sample1.bed #(16)!
│ │ ├── sample1.<caller>.bcftools_stats.txt #(17)!
│ │ ├── sample1.<caller>.g.vcf.gz #(18)!
│ │ ├── sample1.<caller>.g.vcf.gz.tbi #(19)!
│ │ └── validation #(20)!
│ │ └── <caller> #(21)!
│ │ ├── ... #(22)!
│ │ └── sample1.summary.txt #(23)!
│ ├── sample2_<pipeline_version>_<date>
│ └── sample3_<pipeline_version>_<date>
├── family2
│ ├── output_<pipeline_version>_<date>
│ ├── qc_<pipeline_version>_<date>
│ └── sample4_<pipeline_version>_<date>
└── <pipeline_version>_<date> #(24)!
├── execution_report_<date>_<hour>-<minutes>-<seconds>.html #(25)!
├── execution_timeline_<date>_<hour>-<minutes>-<seconds>.html #(26)!
├── execution_trace_<date>_<hour>-<minutes>-<seconds>.html #(27)!
├── multiqc_report.html #(28)!
├── params_2024-11-18_15-41-14.json #(29)!
├── pipeline_dag_<date>_<hour>-<minutes>-<seconds>.html #(30)!
├── pipeline_software_mqc_versions.yml #(31)!
└── samplesheet.<extension> #(32)!
-
The output directory specified with
--outdir
-
The first family name specified in the samplesheet in the
family
field -
This folder contains all major outputs of the current family
-
This folder will only be made when the
--automap
parameter has been used. It contains all output files from the automap process -
A specific folder containing postprocessing output generated for the caller used. This folder will be created for each caller provided to the
--callers
parameter -
This folder contains the files for the specified sample
-
The BED file used to create the VCF file in this folder using the caller specified in the filename
-
The Gemini DB file generated from the output VCF and the PED file. This file will only be created when
--gemini
has been used -
The PED file for the current family. This file will contain the correct samples from the input PED file, when given. The pipeline will try and infer a PED file automatically when none has been given. Mind that the inferring of the PED file can have some issues and isn't perfect. Giving a PED file is the recommended way of providing relational data to the pipeline
-
The final VCF file created using the caller specified in the filename. All required postprocessing methods have been applied on this file
-
The index of the final VCF file
-
This folder contains all quality metrics for the family
-
The statistics calculated by
bcftools stats
-
The relational report created by
somalier relate
-
The folder containing sample specific files
-
The BED file used to create the GVCF files for the sample
-
The statistics of the GVCF file, calculate by
bcftools stats
-
The GVCF file generated by the specified caller
-
The index of the GVCF file
-
This folder contains the validation metrics of this specific sample in the final VCF
-
This folder contains the validation metrics for the final VCF generated using the specified caller
-
Additional files were removed from this example, but they are several VCF files and images for deeper analysis of the validation
-
This file contains a summary of the validation metrics
-
This folder contains pipeline metrics and other pipeline run specific files
-
This file is an HTML file that summarizes a lot of metrics of the pipeline run (cpu usage, memory usage, walltime...)
-
This file is an HTML file that visualizes the timeline of the pipeline run
-
This file is an HTML file that visualizes the trace of the pipeline run
-
The multiqc report containing all main statistics of the output data and tool versions
-
A JSON file containing the used parameters to run this pipeline run
-
This file is an HTML file that visualizes the DAG of the pipeline run
-
This file contains a list of all tools used in the pipeline and their versions
-
The samplesheet used to run this pipeline run