Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

total percentage overlap po_P* #269

Open
NinaGerrekens opened this issue Feb 3, 2025 · 4 comments
Open

total percentage overlap po_P* #269

NinaGerrekens opened this issue Feb 3, 2025 · 4 comments

Comments

@NinaGerrekens
Copy link

Hi,
I have been using the AnnotSV tool for annotating WGS CNV calls, and it has provided a lot of valuable information. However, I have encountered some issues related to the po_P* output.

  1. Currently, po_P* provides a detailed list of individual percentages representing overlap from different variant calls across the three reference databases. However, there is no final consensus percentage or a largest percentage overlap for a given phenotype.
    Since AnnotSV already identifies the phenotype, would it be possible to output either:
  • A final consensus percentage for that phenotype
  • The largest percentage overlap associated with that phenotype
    This would help in quickly assessing the relevance of the phenotype without manually parsing through all the individual percentages.
  1. In some cases, a single CNV entry is associated with 2 or 3 phenotypes, yet the output still includes dozens of detailed individual percentages. This makes it difficult to determine which individual percentages correspond to each phenotype.
    Would it be possible to introduce an additional column that provides either:
  • A consensus percentage or largest overlap for each of the listed phenotypes
  • A way to group the individual percentages according to the phenotype they belong to
@lgmgeo
Copy link
Owner

lgmgeo commented Feb 4, 2025

Hi Nina,

To better understand your need, can you give me a detailed example with:

  • your command line
  • your SV input file
  • the annotations currently given by AnnotSV
  • the annotations you wish to have

Best,
Véronique

@NinaGerrekens
Copy link
Author

NinaGerrekens commented Feb 7, 2025

Dear Véronique,

Here is the command line I used:

$ANNOTSV/bin/AnnotSV -SVinputFile 'inputfile.bed' -outputFile 'outputfile.txt' -genomeBuild 'GRCh38' -svtBEDcol 4 
 -samplesidBEDcol 5 -snvIndelFiles inputsnv.vcf >& AnnotSV.log &

The input file was in BED format and contained the following columns:
#chrom chromStart chromEnd SVTYPE Samples_ID

In the AnnotSV output, I noticed that some CNVs partially overlap with multiple phenotypic annotations for pathogenic gain/loss genomic regions. For example:

In the column po_P_gain_phen:
chr15:24530001-24550000 overlaps with:

  • 15q11.2q13 recurrent (PWS/AS) region (Class 1, BP1-BP3)

  • 15q11.2q13 recurrent (PWS/AS) region (Class 2, BP2-BP3)

chr22:18940001-21440000 overlaps with:

  • 22q11.2 recurrent (DGS/VCFS) region (proximal, A-B) (includes TBX1)

  • 22q11.2 recurrent (DGS/VCFS) region (proximal, A-D) (includes TBX1)

In the column po_P_loss_phen:
chr3:97740001-97790000 overlaps with:

  • Bardet-Biedl syndrome 3, 600151 (3) AR

  • Retinitis pigmentosa 55, 613575 (3) AR

  • Bardet-Biedl syndrome 1, modifier of, 209900 (3) Digenic recessive, AR

For these CNVs, the po_P_gain_source and po_P_loss_source columns indicate multiple origins for the pathogenic gain/loss genomic loci. For example:

chr15:24530001-24550000:

  • dbVar:nssv15172447;dbVar:nssv15149081;dbVar:nssv15148902;dbVar:nssv15125410;...

chr22:18940001-21440000:

  • dbVar:nssv15161991;dbVar:nssv16297047;dbVar:nssv15139753;dbVar:nssv15161992;...

chr3:97740001-97790000:

  • dbVar:nssv15147282;dbVar:nssv15146259;dbVar:nssv18842017;dbVar:nssv16254235;...

Additionally, in the po_P_gain_percent and po_P_loss_percent columns, individual overlap percentages are provided for each source but not an overall percentage of overlap.

My questions are:

  1. Is it possible to display a single overall percentage of overlap instead of individual percentages?

  2. How can I determine which sources in po_P_gain_source and po_P_loss_source and which percentages correspond to specific phenotypes when a CNV overlaps multiple phenotypic regions?

@NinaGerrekens
Copy link
Author

If it is helpful, I can also send the input file and output file of one sample via e-mail.

@lgmgeo
Copy link
Owner

lgmgeo commented Feb 13, 2025

No, it's OK. Thanks for your information.
I'll get back to you asap (next week I hope)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants