Output

All output is generated in the current working directory. SV² generates output files for each stage of genotyping.

Preprocessing Output

Preprocessing output is located in sv2_preprocessing/ in the current working directory. Users can skip preprocessing if the output directory is passed to the -pre argument. This is useful for users that wish to genotype a different set of SVs in previously processed samples

Example Preprocessing Output

ID	CHROM	COVERAGE_MEDIAN	READ_LENGTH_MEDIAN	INSERT_SIZE_MEDIAN	INSERT_SIZE_MAD	BAM_BP_PARSED	SNP_COVERAGE_MEDIAN	SNP_PARSED
HG00096	chr21	5.2	250.0	449.0	109.711	6323985	45.0	61259

Feature Output

Feature output is located in the sv2_features/ directory in the current working directory. File names are formatted as `$SAMPLE-IID_sv2_features.txt.

The feature files contain the raw feature information used for genotyping. These features include the following:

coverage_GCcorrected
discordant_ratio
- ratio of discordant paired-ends to concordant paired-ends
split_ratio
- ratio of split-reads to non-split reads
heterozygous_allele_ratio

Feature extraction can be skipped by passing the path to a directory containing SV² feature output to the -feats option.

Genotyping Output

SV² outputs genotypes in BED format and VCF format in the sv2_genotypes/ directory in the current working directory. Output prefix name is defined with the [-o|-out] OUTPUT_NAME option, the default output name is sv2_genotypes.

VCF Output

Median Phred-adjusted ALT genotype likelihood scores are located in the QUAL column; median REF genotype likelihood scores are located in the INFO column prefixed by REF_GTL=.

PASS in the FILTER column represent SV² standard filters along with additional filters typically used in SV analysis, such as greater than 50% overlap to segmental duplications.

Stringent filters for de novo mutation discovery can be found in the INFO column prefixed with DENOVO_FILTER=.

Variants that cannot be genotyped are reported as ./..

Variant Annotations

The INFO column, SV² contains percent overlap to genomic elements commonly used as filters for disease studies, such as

Assembly gaps
Centromeres
RepeatMasker elements
Segmental Duplications (Low Copy Repeat)
Short Tandem Repeats (STR)
Unmappable Regions (hg19 DAC Blacklist)
Common SV (hg19/hg38: 1000 Genomes Phase 3)

GENES= lists all overlapping gene elements delimited by pipes |. Each element is formatted as TRANSCRIPT_ID,GENE NAME,ELEMENT

Gene Elements (RefSeq)
- coding exons (# overlapping / # total coding exons)
- introns (# overlapping / # total introns)
- UTR
- Promoter (+1kb from TSS "upstream_1kb")
- -1kb TSS ("downstream_1kb")

For example chr6:20652811-20771549 DUP hg19:

GENES=NM_017774,CDKAL1,exon_2/16|NM_017774,CDKAL1,intron_3/15;

Alternatively, users can skip annotation with the -no-anno option.

BED Output

BED output does not contain filtering information or annotations that are found in the VCF output. Both BED and VCF output are given the same file name, but with different extensions. The BED extension is .txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly