-
Notifications
You must be signed in to change notification settings - Fork 8
Proposal: Triple Tokens Entailing Reified Statements
Alternative to proposed baseline by Enrico; and stealing a lot from it.
This is a reduced approach, extending RDF 1.1 by:
- Adding triple terms, defined as opaque literal-like objects, to be used with the new well-formed functional predicate
rdf:tokenOf
to describe tokens. Tokens of triples have a many-to-one-relationship to RDF triples. - Defining a statement entailment to entail that tokens of these triples also reifiy statements; forming a bridge to classic RDF reification.
Relying on that, two informative properties are also defined to relate qualifications ("real reifiers") to the meaning of these tokens.
An analysis of use cases follows, then a means of "unstarring" this.
graph ::= triple*
triple ::= subject predicate object
subject ::= iri | BlankNode
predicate ::= iri
object ::= iri | BlankNode | literal | tripleTerm
tripleTerm ::= triple
Notes:
- This is RDF 1.1 syntax with the addition of
tripleTerm
. - A term is denoted by
r
, a triple byt
, and a graph byg
. - Given a triple
t
, we denote the subject, predicate, object oft
ast.s
,t.p
,t.o
, respectively.
graph ::= triple*
triple ::= ( subject rdf:tokenOf tripleTerm ) |
( subject predicate object )
subject ::= iri | BlankNode
predicate ::= iri
object ::= iri | BlankNode | literal
tripleTerm ::= triple
Syntactic condition: A graph MUST NOT have two triples with the same subject and different triple terms.
Based on RDF 1.1 simple semantics, with the addition of:
- An injective function
SRE
fromtripleTerm
intoliteral
, called the syntactic denotation of triple terms. - The function
[I+A](.)
is extended with:-
[I+A](r) = IL(SRE(r))
ifr
is atripleTerm
.
-
Defined upon the well-formed subset, with the addition of:
rdf:tokenOf
is functional:
∀ x,y1,y2 .
(x,y1) ∈ IEXT(IS(rdf:tokenOf))
⋀ (x,y2) ∈ IEXT(IS(rdf:tokenOf))
→ y1=y2
A triple token denotes an rdf:Statement
instance, defined as this entailment:
∀ x,y . (x,y) ∈ IEXT(IS(rdf:tokenOf))
→ (x,[I+A]([IL+B](y).s)) ∈ IEXT(IS(rdf:subject))
⋀ (x,[I+A](IL(y).p)) ∈ IEXT(IS(rdf:predicate))
⋀ (x,[I+A]([IL+B](y).o)) ∈ IEXT(IS(rdf:object))
where:
-
B
is a mapping fromBlankNode
inIL
toBlankNode
ing
. (Needed since the value space of triple terms are three-tuples of RDF terms, not what they denote.) - The function
[IL+B](.)
is a mapping of triple terms into a three-tuple of terms in the same context as the graph (so its constituents can be mapped toIR ⋃ IP
using[I+A](.)
).
Warning
Since B
is available to the triple representation decoding function (at parse time), it is assumed here that this can also be accessed when it encounters a triple term, to ensure that the mapping in B
is consistent for the entire graph.
(Otherwise, since concrete representations are RDF Document-bounded, would different B
:s somehow have to be part of the triple terms in the abstract syntax? That would certainly complicate things.)
Statement entailment defines that this:
_:x rdf:tokenOf <<' :s :p :o '>> .
entails:
_:x rdf:subject :s .
_:x rdf:predicate :p .
_:x rdf:object :o .
meaning that triple tokens are also classic reifications.
Through entailment, one triple token can thus be the subject of multiple reification triples with the same predicate, e.g. rdf:subject
(but only three relations to resources, unless others are asserted for other reasons). But this does not equate different opaque triple terms encoding each of these combinations, so the functional requirement is not broken.
Example:
_:x rdf:tripleOf <<' :s :p :o '>> .
:s owl:sameAs :q .
Entails:
_:x rdf:subject :s, :q ; rdf:predicate :p ; rdf:object :o .
But not:
_:x rdf:tripleOf <<' :q :p :o '>> .
Note
Is is important to realize that while RDF statements are abstract relationships, rdf:Statement
instances are defined as tokens of these, such as in concrete serializations (or data streams, utterances, etc.). So classic reifications of statements are decidedly tokens.
This formalizes that, but improves upon it by capturing the token representation, and then the interpretation, so that the token corresponds to the triple itself, as an opaque literal. It still means something in the graph, which is why its interpreted constituents are entailed.
rdf:tokenOf a owl:FunctionalProperty, owl:DatatypeProperty ;
rdfs:range rdf:Triple ;
rdfs:comment """
A token of a triple, such an encoding in concrete RDF syntax,
stored a in triple store, or an assertion in a communication or
data stream.
"""@en .
rdf:Triple a owl:Datatype ;
rdfs:comment "A triple encodes a statement."@en .
A vocabulary to cover the case of "real, many-to-many reifiers".
rdfs:qualifies a owl:ObjectProperty ;
rdfs:domain rdfs:Resource ;
rdfs:range rdf:Statement ;
rdfs:comment """
A property relating anything more concrete or specific to one or
more abstract relationships. That includes events, situations or
other circumstances that make these relationships true.
"""@en .
rdfs:qualifiedBy a owl:ObjectProperty ;
owl:inverseOf rdfs:qualifies ;
rdfs:comment """
Relating the statement of a triple token to an observation or
evidence that qualifies it.
"""@en .
Note
This is conceptualized as qualification, to avoid confusion with the classic RDF-notion of reification. The properties could otherwise be rdfs:reifies
and rdfs:reifiedBy
.
Given:
:s :p :o {| rdfs:qualifiedBy <reifier> |} .
which is short for:
:s :p :o .
_:x rdfs:tokenOf <<' :s :p :o '>> .
_:x rdfs:qualifiedBy <reifier> .
And entailing, through statement entailment:
_:x rdf:subject :s .
_:x rdf:predicate :p .
_:x rdf:object :o .
<reifier> rdfs:qualifies _:x . # through the inverseOf
That is thus a variant of saying:
:s :p :o .
<reifier> rdfs:qualifies << :s :p :o >> .
That token is "throw-away" here, but can be used to further annotate provenance or other details pertinent to the triple token itself.
:s :p :o {| rdfs:qualifiedBy [ a :Event ; :date "1876" ] ;
:date "2024-06-04" ;
:source <stream23> |} .
Sub-property chain axioms can also be defined for "shorthands", omitting the explicit reifier:
:happenedWhen rdfs:domain :DataPoint ;
owl:propertyChainAxiom (rdfs:qualifiedBy :date) .
:s :p :o {| :happenedWhen "1876" ;
:date "2024-06-04" ;
:source <stream23> |} .
This token is thus also used as a semi-qualification (which might not be recommended, but is a somewhat common practice).
Broadly categorized as either:
- Token provenance (source, trust, choice of what claim to "trust").
- Statement qualification ("truth-makers" - detailed circumstances such as events or situations).
Informal and varied in use. No explicit distinction between provenance and qualification, but decidedly representing a token of the interpreted statement.
Example from UniProt:
PREFIX citation: <http://purl.uniprot.org/citations/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX up: <http://purl.uniprot.org/core/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
BASE <http://purl.uniprot.org/uniprot/>
<A0A061BIM8> rdf:type up:Protein ;
up:reviewed false ;
up:created "2014-09-03"^^xsd:date ;
up:modified "2023-02-22"^^xsd:date ;
up:version 14 ;
up:mnemonic "A0A061BIM8_METSM" ;
up:citation citation:SIPE7CD2C7B4D8C83BF {|
rdf:type up:Citation_Statement ;
up:scope "NUCLEOTIDE SEQUENCE" ;
up:context <A0A061BIM8#context-MD5E40A3C77564C3539D926817D108BA7B1> {|
up:attribution <A0A061BIM8#attribution-34FA41A4C6F67DE85C64505553BF3980> |} ,
<A0A061BIM8#context-MD5EBAC47BD2B565E1ED63069403184D0D6> {|
up:attribution <A0A061BIM8#attribution-A5C4A5E9D76685B2E93425CE1CEE7601> |} ,
<A0A061BIM8#context-MD5105BBC48BC029A921160A789F6A9A7DC> {|
up:attribution <A0A061BIM8#attribution-6517DA94802D53CE69A9E803A1036BC0> |} ;
up:attribution <A0A061BIM8#attribution-6517DA94802D53CE69A9E803A1036BC0>
|} ,
citation:SIPCFDCFBC556A62197 {|
# ...
|} ,
citation:SIPCC14A32A2041B19F {|
# ...
|} ;
# ...
.
<A0A061BIM8#attribution-34FA41A4C6F67DE85C64505553BF3980> up:evidence ECO:0000313 ;
up:source <http://purl.uniprot.org/embl-cds/CDR50051.1> .
<A0A061BIM8#attribution-A5C4A5E9D76685B2E93425CE1CEE7601> up:evidence ECO:0000313 ;
up:source <http://purl.uniprot.org/embl-cds/CDR50048.1> .
<A0A061BIM8#attribution-6517DA94802D53CE69A9E803A1036BC0> up:evidence ECO:0000313 ;
up:source <http://purl.uniprot.org/embl-cds/CDR50049.1> .
<A0A061BIM8#context-MD5EBAC47BD2B565E1ED63069403184D0D6> rdf:type up:Strain ;
rdfs:label "N27" .
<A0A061BIM8#context-MD5105BBC48BC029A921160A789F6A9A7DC> rdf:type up:Strain ;
rdfs:label "N63" .
<A0A061BIM8#context-MD5E40A3C77564C3539D926817D108BA7B1> rdf:type up:Strain ;
rdfs:label "ACE6" .
citation:SIPE7CD2C7B4D8C83BF rdf:type up:Submission_Citation ;
up:author "Urmite Genomes U." ;
up:date "2014-05"^^xsd:gYearMonth ;
up:submittedTo "EMBL/GenBank/DDBJ" .
Conceptually rooted in Classic Reification, but explicitly designed as qualification (the meaning of the token).
PROV defines qualified forms for its direct relationships. Examples from the spec:
PREFIX prov: <http://www.w3.org/ns/prov#>
## Example 8:
<illustration_activity> a prov:Activity;
prov:usageQualification <use_of_aggregated_data> ;
## Example 9:
prov:qualifiedAssociation [ a prov:Association;
prov:agent <derek>;
prov:hadRole <illustrationist>;
prov:hadPlan <tutorial_blog>
] .
## Example 10:
<bar_chart> a prov:Entity;
prov:qualifiedGeneration <generation_of_bar_chart> ;
# Example 11:
prov:qualifiedDerivation [ a prov:Derivation;
prov:entity <aggregated_by_regions>;
prov:hadActivity <aggregating_activity>;
prov:hadUsage <use_of_aggregated_data>;
prov:hadGeneration <generation_of_bar_chart>
] .
<use_of_aggregated_data> a prov:Usage;
prov:entity <aggregated_by_regions>;
prov:atTime "2011-07-14T03:03:03Z"^^xsd:dateTime .
<generation_of_bar_chart> a prov:Generation;
prov:activity <illustration_activity>;
prov:atTime "2011-07-14T15:52:14Z"^^xsd:dateTime .
With RDF-star, this can instead be annotated on the direct relationships:
## Example 8:
<illustration_activity> a prov:Activity;
prov:used <aggregated_by_regions> {|
prov:specializationOf <use_of_aggregated_data> |} ;
## Example 9:
prov:wasAssociatedWith <derek> {|
prov:hadRole <illustrationist>;
prov:hadPlan <tutorial_blog>
|} .
## Example 10:
<bar_chart> a prov:Entity;
prov:wasGeneratedBy <illustration_activity> {|
prov:specializationOf <generation_of_bar_chart> |} ;
# Example 11:
prov:wasDerivedFrom <aggregated_by_regions> {|
prov:hadActivity <aggregating_activity>;
prov:hadUsage <use_of_aggregated_data>;
prov:hadGeneration <generation_of_bar_chart>
|} .
(Here prov:specializationOf
is used instead of rdfs:qualifiedBy
, showing that this is a pattern whose details depend upon the chosen ontologies.)
Uses explicit provenance (references ans ranks) and qualification (qualifiers), designed as an alternative form of classic reification mixing in components of the PROV vocabulary.
A simplified version of what can be found would look like:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX sdo: <https://schema.org/>
PREFIX : <https://kungbib.github.io/wd2rdbl/ns/>
BASE <http://www.wikidata.org/entity/>
<Q59166653> a sdo:Person ;
rdfs:label "Malgorzata Domino"@en ;
sdo:description "Polish veterinary scientist (animal reproduction)"@en ;
sdo:alumniOf <Q681> {|
:reference [ :statedIn_P248 <Q110411020> ;
:referenceUrl_P854 <https://pub.orcid.org/v3.0/0000-0001-9436-1074/education/6610842> ;
:retrieved_P813 "+2021-11-05T00:00:00Z"^^xsd:dateTime ] ;
rdfs:qualifiedBy [ :academicDegree_P512 <Q1862897> ;
sdo:startDate "+2005-10-01T00:00:00Z"^^xsd:dateTime ;
sdo:endDate "+2011-02-28T00:00:00Z"^^xsd:dateTime ] |} ,
<Q681> {|
:reference [ :statedIn_P248 <Q110411020> ;
:referenceUrl_P854 <https://pub.orcid.org/v3.0/0000-0001-9436-1074/education/6610859> ;
:retrieved_P813 "+2021-11-05T00:00:00Z"^^xsd:dateTime ] ;
rdfs:qualifiedBy [ :academicDegree_P512 <Q950900> ;
sdo:startDate "+2011-10-01T00:00:00Z"^^xsd:dateTime ;
sdo:endDate "+2012-11-08T00:00:00Z"^^xsd:dateTime ] |} ,
<Q681> {|
:reference [ :statedIn_P248 <Q110411020> ;
:referenceUrl_P854 <https://pub.orcid.org/v3.0/0000-0001-9436-1074/education/6610892> ;
:retrieved_P813 "+2021-11-05T00:00:00Z"^^xsd:dateTime ] ;
rdfs:qualifiedBy [ :academicDegree_P512 <Q752297> ;
sdo:startDate "+2011-03-08T00:00:00Z"^^xsd:dateTime ;
sdo:endDate "+2015-04-22T00:00:00Z"^^xsd:dateTime ] |} .
<Q681> rdfs:label "Warsaw University of Life Sciences"@en .
Here, we have three tokens of a single triple (<Q59166653> sdo:alumniOf <Q681>
), one per source of the simple fact that this person was educated at this institution, and each further qualified with detailed information about the education period (taken from the more detailed source behind each token).
Note
There are a lot of such many-to-many particulars in Wikidata. This is an example query for uses of the education property alone:
SELECT DISTINCT ?s {
?s p:P69 ?stmt1, ?stmt2 .
FILTER(?stmt1 != ?stmt2)
?edu ^ps:P69 ?stmt1, ?stmt2 .
} LIMIT 20
See more detailed information and example queries at https://en.wikibooks.org/wiki/SPARQL/WIKIDATA_Qualifiers,_References_and_Ranks.
Conceptually different from Classic Reification, closer to qualification ("hubs" of multiple relationships).
# Example based on <https://www.w3.org/TR/swbp-n-aryRelations/#useCase3>
<John> :hasBought <Lenny_The_Lion> {| rdfs:qualifiedBy <Purchase_1> |} .
<Lenny_The_Lion> :soldBy <ToyStore> {| rdfs:qualifiedBy <Purchase_1> |} .
<Purchase_1> a :Purchase ;
:hasBuyer <John> ;
:hasObject <Lenny_The_Lion> ;
:hasPurpose <Birthday_Gift> ;
:hasAmount "$15" ;
:hasSeller <ToyStore> .
Designed as "relationship instances", so each edge is unique. Can be seen as "qualified predicates". In practice informal and varied in use. Its "pick your own semantics" approach means no way to formally distinguish between provenance and qualification.
#Example from <https://github.com/RDFLib/rdflib/discussions/2795>
:model_z :hasPart :wheel {| :quantity 4 ;
:isPurchasedParts true |} .
:model_z :hasPart :wheel {| a :PartSpecification ;
:quantity 4 ;
:isPurchasedParts true |} .
- Multiple observations (token provenance).
- Qualifying "too abstract/simple/flat" relationships (met vs. Meeting, spouse vs. Marriage).
- publisher and datePublished vs two PublicationEvents (each event qualifying two statements).
Example:
PREFIX : <https://schema.org/>
PREFIX tk: <http://example.org/tokens#>
<book> a :Book ;
:datePublished 2023 {|
rdfs:qualifiedBy <#uspbl> ;
tk:source <harpercollins.com/datastream> ;
tk:timestampMills 1714153402
|} ;
:datePublished 2023 {|
rdfs:qualifiedBy <#ukpbl> ;
tk:source <bloomsbury.co.uk/datastream> ;
tk:timestampMills 1714153404
|} ;
:publisher <HarperCollins> {|
rdfs:qualifiedBy <#uspbl> ;
tk:source <harpercollins.com/datastream> ;
tk:timestampMills 1714153403
|} ,
<Bloomsbury> {|
rdfs:qualifiedBy <#ukpbl> ;
tk:source <bloomsbury.co.uk/datastream> ;
tk:timestampMills 1714153405
|} .
<#uspbl> a :PublicationEvent ;
:location <NewYork> .
<#ukpbl> a :PublicationEvent ;
:location <London> .
Note that with a model like:
:publication owl:propertyChainAxiom (
[ owl:inverseOf rdf:subject ]
[ rdfs:subPropertyOf rdfs:qualifiedBy ; rdf:range :PublicationEvent ]
) .
:publisher owl:propertyChainAxiom (:publication :agent) .
:datePublished owl:propertyChainAxiom (:publication :startDate) .
The above can be connected to a fully explicit PublicationEvent event design, where the shorthands would have been explicily stated:
<#uspbl>
:agent <HarperCollins> # implies :publisher for <book>
:startDate 2023 . # implies :datePublished for <book>
<#ukpbl>
:agent <Bloomsbury> ; # implies :publisher for <book>
:startDate 2023 . # implies :datePublished for <book>
This implies that there are simpler :publisher
and :datePublished
shorthand statements, and tokens thereof, "in the middle" of the book and the publication event (the event being their "truth-maker").
Thus, by relying on the entailed interpretaion of a token, it is possible to model vocabularies to facilitate integration of, for instance, token-designs and N-ary designs.
Given the above LPG example, by declaring this:
_:partSpecRel rdfs:subPropertyOf rdf:reifies;
rdfs:domain :PartSpecification ;
rdfs:range [ a owl:Restriction ;
owl:onProperty rdf:predicate ;
owl:hasValue :hasPart ] .
:partSpecification [
owl:inverseOf owl:propertyChainAxiom (_:partSpecRel rdf:subject) ] .
:component owl:propertyChainAxiom (_:partSpecRel rdf:object) .
A path is formed from this:
:model_z :partSpecification [ :component :wheel ;
:quantity 4 ;
:isPurchasedParts true ] .
(For "unstarring".)
To represent this in RDF 1.1 systems, a triple object of an rdf:tokenOf
is encoded as an rdf:Triple
literal (encoded as one triple in the N-triples format, without trailing dot). That MUST either be consistently skolemized alongside the rest of the triples of the RDF document (or entire triple stream) or expression of the three entailed reification constituents must accompany the rdf:tokenOf
triple. Then the connection between any blank nodes in the token literal is deducible from position (if the data is well-formed).
In RDF 1.2:
_:x rdf:tokenOf <<' :s :p :o '>> .
In RDF 1.1:
_:x rdf:tokenOf "<http://example.org/s> <http://example.org/p> <http://example.org/o>"^^rdf:Triple .
_:x rdf:subject :s .
_:x rdf:predicate :p .
_:x rdf:object :o .
Summary of the RDF-star WG wiki.
- Editor's guide
- Meeting minutes
- RDF terminology
- Scribes
- Use Cases collection
- RDF-star syntax and semantics:
- RDF-star "alternative baseline" (VOTED 2024.11.14 - frozen)
- RDF-star "liberal baseline" (current working version)
- RDF-star "minimal baseline" (VOTED 2024.07.18 - frozen - superseded by vote 2024.11.14 - deprecated)
- RDF-star "working baseline" (working version - deprecated)
- RDF‐star baseline examples
- RDF-star and LPGs
- Extending the baseline with "asserted" stuff
- systems and acronyms
- Task forces
- Text Direction considerations
- Text Direction Proposal
- Triple‐Edge-subgroup-proposals