nf-cmgg/germline: Output

Introduction

This page describes the output produced by the pipeline.

The output directory has been structured in such a way that you can pass the same output directory to it for each pipeline run. The pipeline will add the files to that directory in a traceable way without overwriting already existing files. This makes it easy to store data, coming from multiple sequencing runs, in the same root directory.

To explain the structure of the output directory, a simple example run consisting of two families is used. The first family (family1) is a family consisting of a trio (son, father and mother) and the second family (family2) consists of a single sample.

<outdir> #(1)!
├── family1 #(2)!
│   ├── output_<pipeline_version>_<date> #(3)!
│   │   ├── automap #(4)!
│   │   │   └── <caller> #(5)!
│   │   │       ├── sample1 #(6)!
│   │   │       │   ├── sample1.HomRegions.<panel>.tsv
│   │   │       │   ├── sample1.HomRegions.pdf
│   │   │       │   ├── sample1.HomRegions.strict.<panel>.tsv
│   │   │       │   └── sample1.HomRegions.tsv
│   │   │       ├── sample2
│   │   │       └── sample3
│   │   ├── family1.<caller>.bed #(7)!
│   │   ├── family1.<caller>.db #(8)!
│   │   ├── family1.<caller>.ped #(9)!
│   │   ├── family1.<caller>.vcf.gz #(10)!
│   │   └── family1.<caller>.vcf.gz.tbi #(11)!
│   ├── qc_<pipeline_version>_<date> #(12)!
│   │   ├── family1.<caller>.bcftools_stats.txt #(13)!
│   │   └── family1.<caller>.html #(14)!
│   ├── sample1_<pipeline_version>_<date> #(15)!
│   │   ├── sample1.bed #(16)!
│   │   ├── sample1.<caller>.bcftools_stats.txt #(17)!
│   │   ├── sample1.<caller>.g.vcf.gz #(18)!
│   │   ├── sample1.<caller>.g.vcf.gz.tbi #(19)!
│   │   ├── sample1.per-base.bed.gz #(33)!
│   │   ├── sample1.per-base.bed.gz.csi #(34)!
│   │   └── validation #(20)!
│   │       └── <caller> #(21)!
│   │           ├── ... #(22)!
│   │           └── sample1.summary.txt #(23)!
│   ├── sample2_<pipeline_version>_<date>
│   └── sample3_<pipeline_version>_<date>
├── family2
│   ├── output_<pipeline_version>_<date>
│   ├── qc_<pipeline_version>_<date>
│   └── sample4_<pipeline_version>_<date>
└── <pipeline_version>_<date> #(24)!
    ├── execution_report_<date>_<hour>-<minutes>-<seconds>.html #(25)!
    ├── execution_timeline_<date>_<hour>-<minutes>-<seconds>.html #(26)!
    ├── execution_trace_<date>_<hour>-<minutes>-<seconds>.html #(27)!
    ├── multiqc_report.html #(28)!
    ├── params_2024-11-18_15-41-14.json #(29)!
    ├── pipeline_dag_<date>_<hour>-<minutes>-<seconds>.html #(30)!
    ├── pipeline_software_mqc_versions.yml #(31)!
    └── samplesheet.<extension> #(32)!

The output directory specified with --outdir
The first family name specified in the samplesheet in the family field
This folder contains all major outputs of the current family
This folder will only be made when the --automap parameter has been used. It contains all output files from the automap process
A specific folder containing postprocessing output generated for the caller used. This folder will be created for each caller provided to the --callers parameter
This folder contains the files for the specified sample
The BED file used to create the VCF file in this folder using the caller specified in the filename
The Gemini DB file generated from the output VCF and the PED file. This file will only be created when --gemini has been used
The PED file for the current family. This file will contain the correct samples from the input PED file, when given. The pipeline will try and infer a PED file automatically when none has been given. Mind that the inferring of the PED file can have some issues and isn't perfect. Giving a PED file is the recommended way of providing relational data to the pipeline
The final VCF file created using the caller specified in the filename. All required postprocessing methods have been applied on this file
The index of the final VCF file
This folder contains all quality metrics for the family
The statistics calculated by bcftools stats
The relational report created by somalier relate
The folder containing sample specific files
The BED file used to create the GVCF files for the sample
The statistics of the GVCF file, calculate by bcftools stats
The GVCF file generated by the specified caller
The index of the GVCF file
This folder contains the validation metrics of this specific sample in the final VCF
This folder contains the validation metrics for the final VCF generated using the specified caller
Additional files were removed from this example, but they are several VCF files and images for deeper analysis of the validation
This file contains a summary of the validation metrics
This folder contains pipeline metrics and other pipeline run specific files
This file is an HTML file that summarizes a lot of metrics of the pipeline run (cpu usage, memory usage, walltime...)
This file is an HTML file that visualizes the timeline of the pipeline run
This file is an HTML file that visualizes the trace of the pipeline run
The multiqc report containing all main statistics of the output data and tool versions
A JSON file containing the used parameters to run this pipeline run
This file is an HTML file that visualizes the DAG of the pipeline run
This file contains a list of all tools used in the pipeline and their versions
The samplesheet used to run this pipeline run
The per-base coverage BED file generated by Mosdepth
The index of the per-base coverage BED file generated by Mosdepth