Disease prioritization results benchmarking from different disease ontologies #349

AO33 · 2024-08-09T23:56:58Z

Attached in the screenshot are the top hits for a single sample run through exomiser's phenotype only mode + pheval runner post processing.

The top two hits (ORPHA:25 and OMIM: 231670) are actually the same disease https://monarchinitiative.org/MONDO:0009281.
The diagnosis reported in the phenopacket for this patient is OMIM:231670 which is ranked number two in the results file.
I ran the "pheval-utils benchmark-comparison" command for this single sample, and the result was in the top3 rather than the top1.
Pheval's current implementation requires the analysis specific tool to output results in the same ontology name space as what is present in the input phenopacket in order for a match to be found during pheval's benchmarking. In this specific instance, this causes the ranking to be incorrectly reported
My current thoughts on how to remedy this issue is to leverage the mondo.sssom.tsv file to map everything to a mondo id during the benchmarking process (https://data.monarchinitiative.org/mappings/latest/mondo.sssom.tsv). This is also the way I have been handling this issue in my own analysis
Then in the following code https://github.com/monarch-initiative/pheval/blob/main/src/pheval/analyse/disease_prioritisation_analysis.py
- Have the sssom file --> {"disease_id":"MONDO:123"} as an attribute in the AssessDiseasePrioritisation class.
- Then the assess_disease_prioritisation method could access the map and covert the two ids it pulled and check for a match that way...

matentzn · 2024-08-10T06:29:35Z

Awesome thank you @AO33 for digging and writing this up. This is very important! @yaseminbridges, this ticket is perhaps a bit.. involved (I think the solution may be a bit more than just injecting the mapping into the analysis code, but actually rewiring the diagnosis in the phenopacket), but it is of critical importance. In the same way we normalise genes, we should normalise diagnoses to Mondo. Of course, this will elicit some push back from Jules I imagine, so we need to debate it (dont start this without talking to him), but I feel, from a Monarch-wide standpoint, that this is the right move!

yaseminbridges · 2024-08-10T15:22:57Z

Agreed - this is important. I think the best place for this is outside the analysis code - perhaps as a utility function as we have for converting the genes we should have the same for the disease identifiers.

cmungall · 2024-08-12T16:35:10Z

We discussed this at the berkeley hackathon, we also need to account for the case where the orpha is a superclass of the omim

matentzn · 2024-08-13T10:27:41Z

@yaseminbridges

We discussed this at the Berkeley hackathon, we also need to account for the case where the orpha is a superclass of the omim

Makes this a bit harder. If you like, I can create two tables for you:

the SSSOM Mondo mappings so you know which OMIM->Orpha pairs are identical
a subsumption table so you know which MONDO disease is a subclass of another

However, before we go there:

@cmungall are you sure your

orpha is a superclass of the omim

is not a bit too general? If someone predicts an Orpha that is much higher in the hierarchy, then the diagnosis IMO is not "as specific as it could be". So I would not say the higher "orpha" diagnosis should be treated as if it was correct; it is not "wrong", but too unspecific. You could predic MONDO:disease and would be always right!

AO33 · 2024-08-19T23:18:44Z

Here is the before and after disease_id "mondo mapping" benchmark results for exomiser. There were 71/385 exomiser results where pheval was under reporting the rank for the patients disease.

This isn't just a pheval issue (although pheval is the best place to solve this problem) because exomiser is reporting different results for the same disease.

The real number of times that this "could" be a problem is currently dictated by how any given disease prioritization tool handles its results (i.e. how many times is the tool reporting the same disease, but in different ontology name spaces)

matentzn assigned yaseminbridges Aug 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disease prioritization results benchmarking from different disease ontologies #349

Disease prioritization results benchmarking from different disease ontologies #349

AO33 commented Aug 9, 2024

matentzn commented Aug 10, 2024

yaseminbridges commented Aug 10, 2024

cmungall commented Aug 12, 2024

matentzn commented Aug 13, 2024

AO33 commented Aug 19, 2024 •

edited

Loading

Disease prioritization results benchmarking from different disease ontologies #349

Disease prioritization results benchmarking from different disease ontologies #349

Comments

AO33 commented Aug 9, 2024

matentzn commented Aug 10, 2024

yaseminbridges commented Aug 10, 2024

cmungall commented Aug 12, 2024

matentzn commented Aug 13, 2024

AO33 commented Aug 19, 2024 • edited Loading

AO33 commented Aug 19, 2024 •

edited

Loading