Skip to content
Danny Antaki edited this page Dec 1, 2017 · 17 revisions

All output is generated in the current working directory. SV2 generates output files for each stage of genotyping.


Table of Contents


Preprocessing Output

Preprocessing output is located in sv2_preprocessing/ in the current working directory. Users can skip preprocessing if the output directory is passed to the -pre argument. This is useful for users that wish to genotype a different set of SVs in previously processed samples

Example Preprocessing Output

ID CHROM COVERAGE_MEDIAN READ_LENGTH_MEDIAN INSERT_SIZE_MEDIAN INSERT_SIZE_MAD BAM_BP_PARSED SNP_COVERAGE_MEDIAN SNP_PARSED
HG00096 chr21 5.2 250.0 449.0 109.711 6323985 45.0 61259

Feature Output

Feature output is located in the sv2_features/ directory in the current working directory. File names are formatted as `$SAMPLE-IID_sv2_features.txt.

The feature files contain the raw feature information used for genotyping. These features include the following:

  • coverage_GCcorrected
  • discordant_ratio
    • ratio of discordant paired-ends to concordant paired-ends
  • split_ratio
    • ratio of split-reads to non-split reads
  • heterozygous_allele_ratio

Feature extraction can be skipped by passing the path to a directory containing SV2 feature output to the -feats option.


Genotyping Output

SV2 outputs genotypes in BED format and VCF format in the sv2_genotypes/ directory in the current working directory. Output prefix name is defined with the [-o|-out] OUTPUT_NAME option, the default output name is sv2_genotypes.

VCF Output

Median Phred-adjusted ALT genotype likelihood scores are located in the QUAL column; median REF genotype likelihood scores are located in the INFO column prefixed by REF_GTL=.

PASS in the FILTER column represent SV2 standard filters along with additional filters typically used in SV analysis, such as greater than 50% overlap to segmental duplications.

Stringent filters for de novo mutation discovery can be found in the INFO column prefixed with DENOVO_FILTER=.

Variants that cannot be genotyped are reported as ./..

Variant Annotations

The INFO column, SV2 contains percent overlap to genomic elements commonly used as filters for disease studies, such as

  • Assembly gaps
  • Centromeres
  • RepeatMasker elements
  • Segmental Duplications (Low Copy Repeat)
  • Short Tandem Repeats (STR)
  • Unmappable Regions (hg19 DAC Blacklist)
  • Common SV (hg19/hg38: 1000 Genomes Phase 3)

GENES= lists all overlapping gene elements delimited by pipes |. Each element is formatted as TRANSCRIPT_ID,GENE NAME,ELEMENT

  • Gene Elements (RefSeq)
    • coding exons (# overlapping / # total coding exons)
    • introns (# overlapping / # total introns)
    • UTR
    • Promoter (+1kb from TSS "upstream_1kb")
    • -1kb TSS ("downstream_1kb")

For example chr6:20652811-20771549 DUP hg19:

GENES=NM_017774,CDKAL1,exon_2/16|NM_017774,CDKAL1,intron_3/15;

Alternatively, users can skip annotation with the -no-anno option.

BED Output

BED output does not contain filtering information or annotations that are found in the VCF output. Both BED and VCF output are given the same file name, but with different extensions. The BED extension is .txt.


Clone this wiki locally