Skip to content

nf-cmgg/germline: Output

Introduction

This page describes the output produced by the pipeline.

The output directory has been structured in such a way that you can pass the same output directory to it for each pipeline run. The pipeline will add the files to that directory in a traceable way without overwriting already existing files. This makes it easy to store data, coming from multiple sequencing runs, in the same root directory.

To explain the structure of the output directory, a simple example run consisting of two families is used. The first family (family1) is a family consisting of a trio (son, father and mother) and the second family (family2) consists of a single sample.

<outdir> #(1)!
├── family1 #(2)!
   ├── output_<pipeline_version>_<date> #(3)!
      ├── automap #(4)!
         └── <caller> #(5)!
             ├── sample1 #(6)!
                ├── sample1.HomRegions.<panel>.tsv
                ├── sample1.HomRegions.pdf
                ├── sample1.HomRegions.strict.<panel>.tsv
                └── sample1.HomRegions.tsv
             ├── sample2
             └── sample3
      ├── family1.<caller>.bed #(7)!
      ├── family1.<caller>.db #(8)!
      ├── family1.<caller>.ped #(9)!
      ├── family1.<caller>.vcf.gz #(10)!
      └── family1.<caller>.vcf.gz.tbi #(11)!
   ├── qc_<pipeline_version>_<date> #(12)!
      ├── family1.<caller>.bcftools_stats.txt #(13)!
      └── family1.<caller>.html #(14)!
   ├── sample1_<pipeline_version>_<date> #(15)!
      ├── sample1.bed #(16)!
      ├── sample1.<caller>.bcftools_stats.txt #(17)!
      ├── sample1.<caller>.g.vcf.gz #(18)!
      ├── sample1.<caller>.g.vcf.gz.tbi #(19)!
      └── validation #(20)!
          └── <caller> #(21)!
              ├── ... #(22)!
              └── sample1.summary.txt #(23)!
   ├── sample2_<pipeline_version>_<date>
   └── sample3_<pipeline_version>_<date>
├── family2
   ├── output_<pipeline_version>_<date>
   ├── qc_<pipeline_version>_<date>
   └── sample4_<pipeline_version>_<date>
└── <pipeline_version>_<date> #(24)!
    ├── execution_report_<date>_<hour>-<minutes>-<seconds>.html #(25)!
    ├── execution_timeline_<date>_<hour>-<minutes>-<seconds>.html #(26)!
    ├── execution_trace_<date>_<hour>-<minutes>-<seconds>.html #(27)!
    ├── multiqc_report.html #(28)!
    ├── params_2024-11-18_15-41-14.json #(29)!
    ├── pipeline_dag_<date>_<hour>-<minutes>-<seconds>.html #(30)!
    ├── pipeline_software_mqc_versions.yml #(31)!
    └── samplesheet.<extension> #(32)!
  1. The output directory specified with --outdir

  2. The first family name specified in the samplesheet in the family field

  3. This folder contains all major outputs of the current family

  4. This folder will only be made when the --automap parameter has been used. It contains all output files from the automap process

  5. A specific folder containing postprocessing output generated for the caller used. This folder will be created for each caller provided to the --callers parameter

  6. This folder contains the files for the specified sample

  7. The BED file used to create the VCF file in this folder using the caller specified in the filename

  8. The Gemini DB file generated from the output VCF and the PED file. This file will only be created when --gemini has been used

  9. The PED file for the current family. This file will contain the correct samples from the input PED file, when given. The pipeline will try and infer a PED file automatically when none has been given. Mind that the inferring of the PED file can have some issues and isn't perfect. Giving a PED file is the recommended way of providing relational data to the pipeline

  10. The final VCF file created using the caller specified in the filename. All required postprocessing methods have been applied on this file

  11. The index of the final VCF file

  12. This folder contains all quality metrics for the family

  13. The statistics calculated by bcftools stats

  14. The relational report created by somalier relate

  15. The folder containing sample specific files

  16. The BED file used to create the GVCF files for the sample

  17. The statistics of the GVCF file, calculate by bcftools stats

  18. The GVCF file generated by the specified caller

  19. The index of the GVCF file

  20. This folder contains the validation metrics of this specific sample in the final VCF

  21. This folder contains the validation metrics for the final VCF generated using the specified caller

  22. Additional files were removed from this example, but they are several VCF files and images for deeper analysis of the validation

  23. This file contains a summary of the validation metrics

  24. This folder contains pipeline metrics and other pipeline run specific files

  25. This file is an HTML file that summarizes a lot of metrics of the pipeline run (cpu usage, memory usage, walltime...)

  26. This file is an HTML file that visualizes the timeline of the pipeline run

  27. This file is an HTML file that visualizes the trace of the pipeline run

  28. The multiqc report containing all main statistics of the output data and tool versions

  29. A JSON file containing the used parameters to run this pipeline run

  30. This file is an HTML file that visualizes the DAG of the pipeline run

  31. This file contains a list of all tools used in the pipeline and their versions

  32. The samplesheet used to run this pipeline run