total percentage overlap po_P* #269

NinaGerrekens · 2025-02-03T19:57:37Z

Hi,
I have been using the AnnotSV tool for annotating WGS CNV calls, and it has provided a lot of valuable information. However, I have encountered some issues related to the po_P* output.

Currently, po_P* provides a detailed list of individual percentages representing overlap from different variant calls across the three reference databases. However, there is no final consensus percentage or a largest percentage overlap for a given phenotype.
Since AnnotSV already identifies the phenotype, would it be possible to output either:

A final consensus percentage for that phenotype
The largest percentage overlap associated with that phenotype
This would help in quickly assessing the relevance of the phenotype without manually parsing through all the individual percentages.

In some cases, a single CNV entry is associated with 2 or 3 phenotypes, yet the output still includes dozens of detailed individual percentages. This makes it difficult to determine which individual percentages correspond to each phenotype.
Would it be possible to introduce an additional column that provides either:

A consensus percentage or largest overlap for each of the listed phenotypes
A way to group the individual percentages according to the phenotype they belong to

lgmgeo · 2025-02-04T13:46:35Z

Hi Nina,

To better understand your need, can you give me a detailed example with:

your command line
your SV input file
the annotations currently given by AnnotSV
the annotations you wish to have

Best,
Véronique

NinaGerrekens · 2025-02-07T15:50:27Z

Dear Véronique,

Here is the command line I used:

$ANNOTSV/bin/AnnotSV -SVinputFile 'inputfile.bed' -outputFile 'outputfile.txt' -genomeBuild 'GRCh38' -svtBEDcol 4 
 -samplesidBEDcol 5 -snvIndelFiles inputsnv.vcf >& AnnotSV.log &

The input file was in BED format and contained the following columns:
#chrom chromStart chromEnd SVTYPE Samples_ID

In the AnnotSV output, I noticed that some CNVs partially overlap with multiple phenotypic annotations for pathogenic gain/loss genomic regions. For example:

In the column po_P_gain_phen:
chr15:24530001-24550000 overlaps with:

15q11.2q13 recurrent (PWS/AS) region (Class 1, BP1-BP3)
15q11.2q13 recurrent (PWS/AS) region (Class 2, BP2-BP3)

chr22:18940001-21440000 overlaps with:

22q11.2 recurrent (DGS/VCFS) region (proximal, A-B) (includes TBX1)
22q11.2 recurrent (DGS/VCFS) region (proximal, A-D) (includes TBX1)

In the column po_P_loss_phen:
chr3:97740001-97790000 overlaps with:

Bardet-Biedl syndrome 3, 600151 (3) AR
Retinitis pigmentosa 55, 613575 (3) AR
Bardet-Biedl syndrome 1, modifier of, 209900 (3) Digenic recessive, AR

For these CNVs, the po_P_gain_source and po_P_loss_source columns indicate multiple origins for the pathogenic gain/loss genomic loci. For example:

chr15:24530001-24550000:

dbVar:nssv15172447;dbVar:nssv15149081;dbVar:nssv15148902;dbVar:nssv15125410;...

chr22:18940001-21440000:

dbVar:nssv15161991;dbVar:nssv16297047;dbVar:nssv15139753;dbVar:nssv15161992;...

chr3:97740001-97790000:

dbVar:nssv15147282;dbVar:nssv15146259;dbVar:nssv18842017;dbVar:nssv16254235;...

Additionally, in the po_P_gain_percent and po_P_loss_percent columns, individual overlap percentages are provided for each source but not an overall percentage of overlap.

My questions are:

Is it possible to display a single overall percentage of overlap instead of individual percentages?
How can I determine which sources in po_P_gain_source and po_P_loss_source and which percentages correspond to specific phenotypes when a CNV overlaps multiple phenotypic regions?

NinaGerrekens · 2025-02-11T20:52:19Z

If it is helpful, I can also send the input file and output file of one sample via e-mail.

lgmgeo · 2025-02-13T07:04:41Z

No, it's OK. Thanks for your information.
I'll get back to you asap (next week I hope)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

total percentage overlap po_P* #269

total percentage overlap po_P* #269

NinaGerrekens commented Feb 3, 2025

lgmgeo commented Feb 4, 2025

NinaGerrekens commented Feb 7, 2025 •

edited

Loading

NinaGerrekens commented Feb 11, 2025

lgmgeo commented Feb 13, 2025

total percentage overlap po_P* #269

total percentage overlap po_P* #269

Comments

NinaGerrekens commented Feb 3, 2025

lgmgeo commented Feb 4, 2025

NinaGerrekens commented Feb 7, 2025 • edited Loading

NinaGerrekens commented Feb 11, 2025

lgmgeo commented Feb 13, 2025

NinaGerrekens commented Feb 7, 2025 •

edited

Loading