-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDF Schema contains erroneous rdfs:domain triples #276
Comments
That's explicit in schema salad, if two classes use the same field name, it is intended that the field name is semantically the same with the same predicate. However, when I designed the RDFS export, I don't think I was aware of the entailment rules. I think I interpreted I believe the correct way to fix this in the CWL schema specifically would be to find the situations where different classes use the same field, and extract that field to a parent type. So the presence of "basename" would entail that the object was "the class of objects that have basename (of which File and Directory are subtypes)" and not "must be both a File and a Directory". For example the "Documented" type that implies a field called Do you have a particular project where you are using RDFS entailment rules with CWL? While linked data underlies the CWL definition, applications that actually map a CWL document to triples have been fairly niche, so I'd love to hear more. |
Totally agree with that, that seems like the most straight-forward interpretation.
You mean something like this, where
In this case the parent type has to be defined in the SALAD-Schema itself, too, by defining File and Directory as extension of FileSystemRessource:
Which leads to subclass relationships in RDF:
In this case the RDFS-only modeling seems fine, as the entailment that every file and every directory is a filesystem ressource is correct (rules 'rdfs2' + 'rdfs9' in RDF 1.1 Semantics). However, I would like to point out the following: This approach doesn't work in every case, e.g.: If one wants to avoid to build a class hierarchy altogether (like in the 'parentOf' case with a 'Parent' class), one could use OWL 2 instead with the Union of Class expressions (could be suitable for Apache Avro Unions used in fields, which still need to be translated to a correct
But OWL 2 introduces a much more complex set of possibilities to model semantics.
In RDF, this is currently stated as:
Which entails that every entity, which has a
I am working on my master thesis "Automated transformation processing for dynamic RDF information integration and XML document generation" and would like to use CWL for the general transformation processing. As the leading data management takes place in RDF, all Workflows and CommandLineTools have to be transformed into RDF to be persisted (which is why I noticed this mistake). Finally, the idea is to be able to generate workflows (partially) automatically using entailment rules and additional annotations on the CWL constructs (which are only visible within the RDF database to generate jobs, and are removed before the job is actually submitted to a CWL runner). |
Maybe even a completely different approach works here better: Instead of fiddling around with the open world assumption and globality of the RDFS and OWL approaches, one could instead use SHACL to define shapes for every record type. At the moment I am trying to do this by deriving SHACL shapes from the RDF triples of the Common Workflow Language using SHACL rules. |
I am aware of SHACL but haven't actually studied it, so I don't know exactly how it compares to RDFS. My impression is that SHACL is a much better fit than RDFS for modeling nested data structures such as are found in CWL, so a Schema salad-to-SHACL translation is probably a productive line of research. In that case, the problematic RDFS definitions could probably be dropped in favor of SHACL definitions? |
Description
The turtle file with RDF(S) of the CWL SALAD Schema retrievable under http://commonwl.org/v1.2/cwl.ttl (or alternatively by running
schema-salad-tool --print-rdfs
with the full schemaCommonWorkflowLanguage.yml
in the cwl-v1.2 repository) contains the following statement:Which states (by inference, see RDF 1.1 Semantics entailment pattern 'rdfs2') that every subject IRI of a triple with the predicate
https://w3id.org/cwl/cwl#basename
is an instance of the class@base:Directory
and an instance of the class@base:File
. This is clearly not the case as not every File is a Directory (and they even have different fields in CWL too).Analysis
Perhaps this error is due to the fact that the semantics of the object-oriented model and RDFS mis-match:
Solution ideas
salad-schema-tool
does infer only the most specific type of the domain/range of a property, which is valid globally in all contexts in which the property is used (not like now every most specific type; by following the class hierarchy upwards until a common class for all domain/range usages of the property is found).The text was updated successfully, but these errors were encountered: