-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Similarity Calculation in Distance Module #41
Comments
This is based on my PhD see section "12.2.3 The subsumption link is not a unitary length" page 371 of https://theses.hal.science/tel-00378201 Happy to have a work session on that. The first thing to do is to draw the sub hierarchy on which you are testing with all its paths. |
Here is the schema of the sub-hierarchy on which the test is performed. I use this query to obtain the image: CONSTRUCT {
<http://www.wikidata.org/entity/Q8928> rdfs:subClassOf ?parent .
?parent rdfs:subClassOf ?ancestor .
<http://www.wikidata.org/entity/Q8928> owl:equivalentClass ?equivalentClass .
<http://www.wikidata.org/entity/Q8928> owl:sameAs ?sameAs .
}
WHERE {
{
<http://www.wikidata.org/entity/Q8928> rdfs:subClassOf ?parent .
OPTIONAL {
?parent rdfs:subClassOf* ?ancestor .
FILTER (isIRI(?ancestor) && ?ancestor != <http://www.wikidata.org/entity/Q8928>)
}
}
OPTIONAL {
<http://www.wikidata.org/entity/Q8928> owl:equivalentClass ?equivalentClass .
}
OPTIONAL {
<http://www.wikidata.org/entity/Q8928> owl:sameAs ?sameAs .
}
} |
it's not really a hierarchy since ns2:Constellation and ns1:Q8928 are set as equivalent i.e. they are the same class so the path between them is 0
then we have just one path owl:Thing < ns2:CelestialBody < ( ns2:Constellation | ns1:Q8928)
typically owl:Thing < ns2:CelestialBody should be 1/2 and ns2:CelestialBody < ( ns2:Constellation | ns1:Q8928) should be 1/4
|
Dr Fabien you are perfectly right and thats why I reported the bug. In the case where 2 entities are equivalent the similarity calculation is inconsistent. The problem is not at all with your theory/algorithm rather the implementation of this edge case. I fixed it in my python implementation and reported it to Remi, it becomes a little more complex the deeper we go through those equivalent nodes. The way I dealt with it is keeping a cache of classes that are equivalent and looking it up when ever I calculate a distance or trying to find common ancestors. |
Could you please report the distances you are getting from CORESE on that example and the one you would expect to document the issue?
|
yes for sure: expected distance (classes have an equivalence relation): actual distance given by Corese: |
Yes we agree.So the fix is to link all equivalent classes with rdfs:subClassOf links of length 0 in both directions NB: the current implementation was done for RDFS not for OWLFabien Gandon, http://fabien.info
-------- Message d'origine --------De : ali-ballout ***@***.***> Date : 02/08/2023 22:43 (GMT+01:00) À : Wimmics/corese ***@***.***> Cc : Fabien Gandon ***@***.***>, Comment ***@***.***> Objet : Re: [Wimmics/corese] Inconsistent Similarity Calculation in Distance Module (Issue corese-stack/corese-ark#25)
yes for sure:
expected distance (classes have an equivalence relation):
dist(http://www.wikidata.org/entity/Q8928 and http://dbpedia.org/ontology/Constellation) = 0
actual distance given by Corese:
dist(http://www.wikidata.org/entity/Q8928 and http://dbpedia.org/ontology/Constellation) = 0.12500023
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Makes perfect sense, thats exactly what I did for my package. |
Issue Description:
A bug has been identified in the distance module of the software. This issue has been reported by user Ali Ballout.
Bug Details:
The problem arises when dealing with classes that are considered equivalent and have subclass relations between each other. For instance, let's consider the classes http://dbpedia.org/ontology/Constellation and http://www.wikidata.org/entity/Q8928.
The software Corese assigns depths to these classes based on their order of appearance. As a result, one of the classes might receive a depth of 2, while the other gets a depth of 3. The issue manifests during the calculation of distance and similarity between these classes.
Steps to Reproduce:
mainDbpedia.owl
into the application.Expected Behavior:
The similarity calculation should be consistent regardless of the order of the classes.
Actual Behavior:
The similarity calculation yields different results depending on the position of the classes. For instance:
In this case, it seems that one of the classes is erroneously used as a common ancestor because it is found in the list of ancestors of the other class and has a higher depth.
However, in this scenario, no common ancestor is being used, as both classes are listed as each other's parents in the list of ancestors.
Note to Developers:
This inconsistency in similarity calculation based on class order and subclass relations could lead to incorrect results and impacts the accuracy of the distance module. Further investigation and debugging are required to resolve the issue.
Screenshots/Attachments:
mainDbpedia.zip
The text was updated successfully, but these errors were encountered: