-
Notifications
You must be signed in to change notification settings - Fork 11
Output
All output is generated in the current working directory. SV2 generates output files for each stage of genotyping.
Preprocessing output is located in sv2_preprocessing/
in the current working directory. Users can skip preprocessing if the output directory is passed to the -pre
argument. This is useful for users that wish to genotype a different set of SVs in previously processed samples
ID | CHROM | COVERAGE_MEDIAN | READ_LENGTH_MEDIAN | INSERT_SIZE_MEDIAN | INSERT_SIZE_MAD | BAM_BP_PARSED | SNP_COVERAGE_MEDIAN | SNP_PARSED |
---|---|---|---|---|---|---|---|---|
HG00096 | chr21 | 5.2 | 250.0 | 449.0 | 109.711 | 6323985 | 45.0 | 61259 |
Feature output is located in the sv2_features/
directory in the current working directory. File names are formatted as `$SAMPLE-IID_sv2_features.txt.
The feature files contain the raw feature information used for genotyping. These features include the following:
- coverage_GCcorrected
- discordant_ratio
- ratio of discordant paired-ends to concordant paired-ends
- split_ratio
- ratio of split-reads to non-split reads
- heterozygous_allele_ratio
Feature extraction can be skipped by passing the path to a directory containing SV2 feature output to the -feats
option.
SV2 outputs genotypes in BED format and VCF format in the sv2_genotypes/
directory in the current working directory. Output prefix name is defined with the [-o|-out] OUTPUT_NAME
option, the default output name is sv2_genotypes
.
Median Phred-adjusted ALT genotype likelihood scores are located in the QUAL
column; median REF genotype likelihood scores are located in the INFO
column prefixed by REF_GTL=
.
PASS
in the FILTER
column represent SV2 standard filters along with additional filters typically used in SV analysis, such as greater than 50% overlap to segmental duplications.
Stringent filters for de novo mutation discovery can be found in the INFO
column prefixed with DENOVO_FILTER=
.
Variants that cannot be genotyped are reported as ./.
.
The INFO
column, SV2 contains percent overlap to genomic elements commonly used as filters for disease studies, such as
- Assembly gaps
- Centromeres
- RepeatMasker elements
- Segmental Duplications (Low Copy Repeat)
- Short Tandem Repeats (STR)
- Unmappable Regions (hg19 DAC Blacklist)
- Common SV (hg19/hg38: 1000 Genomes Phase 3)
GENES=
lists all overlapping gene elements delimited by pipes |
. Each element is formatted as TRANSCRIPT_ID,GENE NAME,ELEMENT
- Gene Elements (RefSeq)
- coding exons (# overlapping / # total coding exons)
- introns (# overlapping / # total introns)
- UTR
- Promoter (+1kb from TSS "upstream_1kb")
- -1kb TSS ("downstream_1kb")
For example chr6:20652811-20771549 DUP hg19
:
GENES=NM_017774,CDKAL1,exon_2/16|NM_017774,CDKAL1,intron_3/15;
Alternatively, users can skip annotation with the -no-anno
option.
BED output does not contain filtering information or annotations that are found in the VCF output. Both BED and VCF output are given the same file name, but with different extensions. The BED extension is .txt
.