Skip to content

Proposal: Triple Tokens Entailing Reified Statements

Niklas Lindström edited this page Jun 6, 2024 · 10 revisions

Alternative to proposed baseline by Enrico; and stealing a lot from it.

This:

  • Adds triple terms, defined as opaque literal-like objects by be used with the functional predicate rdf:tokenOf to describe tokens. Tokens of triples have a many-to-one-relationship to RDF triples.
  • Defines a statement entailment to entail that tokens of these triples also reifiy statements; forming a bridge to classic RDF reification.

Upon this, a basic vocabulary is defined to relate qualifications ("real reifiers") to the meaning of these tokens.

An analysis of use cases follows, then a means of "unstarring" this.

RDF Abstract Syntax

Basic Abstract Syntax

graph                 ::= triple*
triple                ::= subject predicate object
subject               ::= iri | BlankNode
predicate             ::= iri
object                ::= iri | BlankNode | literal | tripleTerm
tripleTerm            ::= triple

Notes:

  • This is RDF 1.1 syntax with the addition of tripleTerm.
  • A term is denoted by r, a triple by t, and a graph by g.
  • Given a triple t, we denote the subject, predicate, object of t as t.s, t.p, t.o, respectively.

Well-formed Abstract Syntax

graph                 ::= triple*
triple                ::= ( subject rdf:tokenOf tripleTerm ) |
                          ( subject predicate object )
subject               ::= iri | BlankNode
predicate             ::= iri
object                ::= iri | BlankNode | literal

Syntactic condition: A graph MUST NOT have two triples with the same subject and different triple terms.

Semantics

Based on RDF 1.1 simple semantics, with the addition of:

  • An injective function SRE from tripleTerm into literal, called the syntactic denotation of triple terms.
  • The function [I+A](.) is extended with:
    • [I+A](r) = IL(SRE(r)) if r is a tripleTerm.

Well-Formed Semantics

Defined upon the well-formed subset, with the addition of:

Many-to-One Tokens

rdf:tokenOf is functional:

∀ x,y1,y2 .
    (x,y1) ∈ IEXT(IS(rdf:tokenOf))
    ⋀ (x,y2) ∈ IEXT(IS(rdf:tokenOf))
→ y1=y2

Statement Entailment

A triple token denotes an rdf:Statement instance, defined as this entailment:

∀ x,y . (x,y) ∈ IEXT(IS(rdf:tokenOf))
→ (x,[I+A]([IL+B](y).s)) ∈ IEXT(IS(rdf:subject))
  ⋀ (x,[I+A](IL(y).p)) ∈ IEXT(IS(rdf:predicate))
  ⋀ (x,[I+A]([IL+B](y).o)) ∈ IEXT(IS(rdf:object))

where:

  • B is a mapping from BlankNode in IL to BlankNode in g. (Needed since the value space of triple terms are three-tuples of RDF terms, not what they denote.)
  • The function [IL+B](.) is a mapping of triple terms into a three-tuple of terms in the same context as the graph (so its constituents can be mapped to IR ⋃ IP using [I+A](.)).

Warning

Since B is available to the triple representation decoding function (at parse time), it is assumed here that this can also be accessed when it encounters a triple term, to ensure that the mapping in B is consistent for the entire graph.

(Otherwise, since concrete representations are RDF Document-bounded, would different B:s somehow have to be part of the triple terms in the abstract syntax? That would certainly complicate things.)

Triple Tokens Entail Classic Reifications

Statement entailment defines that this:

_:x rdf:tokenOf <<' :s :p :o '>> .

entails:

_:x rdf:subject :s .
_:x rdf:predicate :p .
_:x rdf:object :o .

meaning that triple tokens are also classic reifications.

Each a Token of a Single Triple

Through entailment, one triple token can thus be the subject of multiple reification triples with the same predicate, e.g. rdf:subject (but only three relations to resources, unless others are asserted for other reasons). But this does not equate different opaque triple terms encoding each of these combinations, so the functional requirement is not broken.

Example:

_:x rdf:tripleOf <<' :s :p :o '>> .
:s owl:sameAs :q .

Entails:

_:x rdf:subject :s, :q ; rdf:predicate :p ; rdf:object :o .

But not:

_:x rdf:tripleOf <<' :q :p :o '>> .

Note

Is is important to realize that while RDF statements are abstract relationships, rdf:Statement instances are defined as tokens of these, such as in concrete serializations (or data streams, utterances, etc.). So classic reifications of statements are decidedly tokens.

This formalizes that, but improves upon it by capturing the token representation, and then the interpretation, so that the token corresponds to the triple itself, as an opaque literal. It still means something in the graph, which is why its interpreted constituents are entailed.

RDF Vocabulary

rdf:tokenOf a owl:FunctionalProperty, owl:DatatypeProperty ;
    rdfs:range rdf:Triple ;
    rdfs:comment """
        A token of a triple, such an encoding in concrete RDF syntax,
        stored a in triple store, or an assertion in a communication or
        data stream.
        """@en .

rdf:Triple a owl:Datatype ;
    rdfs:comment "A triple encodes a statement."@en .

Qualifications of Statements

A vocabulary to cover the case of "real, many-to-many reifiers".

Vocabulary

rdfs:qualifies a owl:ObjectProperty ;
    rdfs:domain rdfs:Resource ;
    rdfs:range rdf:Statement ;
    rdfs:comment """
        A property relating anything more concrete or specific to one or
        more abstract relationships. That includes events, situations or
        other circumstances that make these relationships true.
        """@en .

rdfs:qualifiedBy a owl:ObjectProperty ;
    owl:inverseOf rdfs:qualifies ;
    rdfs:comment """
        Relating the statement of a triple token to an observation or
        evidence that qualifies it.
        """@en .

Note

This is conceptualized as qualification, to avoid confusion with the classic RDF-notion of reification. The properties could otherwise be rdfs:reifies and rdfs:reifiedBy.

Examples

Reifiying Statements by Qualifiying Tokens

Given:

:s :p :o {| rdf:qualifiedBy <reifier> |} .

which is short for:

:s :p :o .

_:x rdf:tokenOf <<' :s :p :o '>> .
_:x rdfs:qualifiedBy <reifier> .

And entailing, through statement entailment:

_:x rdf:subject :s .
_:x rdf:predicate :p .
_:x rdf:object :o .

<reifier> rdfs:qualifies _:x .  # through the inverseOf

That is thus a variant of saying:

:s :p :o .
<reifier> rdfs:qualifies << :s :p :o >> .

That token is "throw-away" here, but can be used to further annotate provenance or other details pertinent to the triple token itself.

:s :p :o {| rdf:qualifiedBy [ a :Event ; :date "1876" ] ;
            :date "2024-06-04" ;
            :source <stream23> |} .

Combined Token-Qualifications

Sub-property chain axioms can also be defined for "shorthands", omitting the explicit reifier:

:happenedWhen rdfs:domain :DataPoint ;
    owl:propertyChainAxiom (rdf:qualifiedBy :date) .

:s :p :o {| :happenedWhen "1876" ;
            :date "2024-06-04" ;
            :source <stream23> |} .

This token is thus also used as a semi-qualification (which might not be recommended, but is a somewhat common practice).

Compatibility With Existing Use Cases

Broadly categorized as either:

  • Token provenance (source, trust, choice of what claim to "trust").
  • Statement qualification ("truth-makers" - detailed circumstances such as events or situations).

Classic Reification

Informal and varied in use. No explicit distinction between provenance and qualification, but decidedly representing a token of the interpreted statement.

Example from UniProt:

base <http://purl.uniprot.org/uniprot/>

<#_A0A497J299-citation-SIP4BBB8AACCB71F9A8> a rdf:Statement ;
  up:attribution <A0A497J299#attribution-4988E00BE14DA48B3901A433D86A713A> ;
  up:scope "NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]"^^xsd:string ;
  up:context <A0A497J299#context-MD532058D60F506E995629A186BBE1050DA> .

<#_A0A497J3A8-citation-SIP4BBB8AACCB71F9A8> a rdf:Statement ;
  up:attribution <A0A497J3A8#attribution-4988E00BE14DA48B3901A433D86A713A> ;
  up:scope "NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]"^^xsd:string ;
  up:context <A0A497J3A8#context-MD532058D60F506E995629A186BBE1050DA> .

PROV Ontology

Conceptually rooted in Classic Reification, but explicitly designed as qualification (the meaning of the token).

PROV defines qualified forms for its direct relationships. Examples from the spec:

PREFIX prov: <http://www.w3.org/ns/prov#>

## Example 8:
<illustration_activity> a prov:Activity;
  prov:usageQualification <use_of_aggregated_data> ;
  ## Example 9:
  prov:qualifiedAssociation [ a prov:Association;
      prov:agent   <derek>;
      prov:hadRole <illustrationist>;
      prov:hadPlan <tutorial_blog>
    ] .

## Example 10:
<bar_chart> a prov:Entity;
  prov:qualifiedGeneration <generation_of_bar_chart> ;
  # Example 11:
  prov:qualifiedDerivation [ a prov:Derivation;
      prov:entity        <aggregated_by_regions>;
      prov:hadActivity   <aggregating_activity>;
      prov:hadUsage      <use_of_aggregated_data>;
      prov:hadGeneration <generation_of_bar_chart>
    ] .

<use_of_aggregated_data> a prov:Usage;
prov:entity <aggregated_by_regions>;
prov:atTime "2011-07-14T03:03:03Z"^^xsd:dateTime .

<generation_of_bar_chart> a prov:Generation;
prov:activity <illustration_activity>;
prov:atTime "2011-07-14T15:52:14Z"^^xsd:dateTime .

With RDF-star, this can instead be annotated on the direct relationships:

## Example 8:
<illustration_activity> a prov:Activity;
  prov:used <aggregated_by_regions> {|
      prov:specializationOf <use_of_aggregated_data> |} ;
  ## Example 9:
  prov:wasAssociatedWith <derek> {|
      prov:hadRole <illustrationist>;
      prov:hadPlan <tutorial_blog>
    |} .

## Example 10:
<bar_chart> a prov:Entity;
  prov:wasGeneratedBy <illustration_activity> {|
      prov:specializationOf <generation_of_bar_chart> |} ;
  # Example 11:
  prov:wasDerivedFrom <aggregated_by_regions> {|
      prov:hadActivity <aggregating_activity>;
      prov:hadUsage <use_of_aggregated_data>;
      prov:hadGeneration <generation_of_bar_chart>
    |} .

(Here prov:specializationOf is used instead of rdfs:qualifiedBy, showing that this is a pattern whose details depend upon the chosen ontologies.)

Wikidata

Uses explicit provenance (references ans ranks) and qualification (qualifiers), designed as an alternative form of classic reification mixing in components of the PROV vocabulary.

N-ary Relations

Conceptually different from Classic Reification, closer to qualification ("hubs" of multiple relationships).

# Example based on <https://www.w3.org/TR/swbp-n-aryRelations/#useCase3>

<John> :hasBought <Lenny_The_Lion> {| rdfs:qualifiedBy <Purchase_1> |} .
<Lenny_The_Lion> :soldBy <ToyStore> {| rdfs:qualifiedBy <Purchase_1> |} .

<Purchase_1> a :Purchase ;
    :hasBuyer <John> ;
    :hasObject <Lenny_The_Lion> ;
    :hasPurpose <Birthday_Gift> ;
    :hasAmount "$15" ;
    :hasSeller <ToyStore> .

LPGs

Designed as "relationship instances", so each edge is unique. Can be seen as "qualified predicates". In practice informal and varied in use. Its "pick your own semantics" approach means no way to formally distinguish between provenance and qualification.

#Example from <https://github.com/RDFLib/rdflib/discussions/2795>
:model_z :hasPart :wheel {| :quantity 4 ;
                          :isPurchasedParts true |} .

:model_z :hasPart :wheel {| a :PartSpecification ;
        :quantity 4 ;
        :isPurchasedParts true |} .

The Needs For Many

  • Multiple observations (token provenance).
  • Qualifying "too abstract/simple/flat" relationships (met vs. Meeting, spouse vs. Marriage).
  • publisher and datePublished vs two PublicationEvents (each event qualifying two statements).

Example:

PREFIX : <https://schema.org/>
PREFIX tk: <http://example.org/tokens#>

<book> a :Book ;
  :datePublished 2023 {|
      rdfs:qualifiedBy <#uspbl> ;
      tk:source <harpercollins.com/datastream> ;
      tk:timestampMills 1714153402
    |} ;
  :datePublished 2023 {|
      rdfs:qualifiedBy <#ukpbl> ;
      tk:source <bloomsbury.co.uk/datastream> ;
      tk:timestampMills 1714153404
    |} ;
  :publisher <HarperCollins> {|
      rdfs:qualifiedBy <#uspbl> ;
      tk:source <harpercollins.com/datastream> ;
      tk:timestampMills 1714153403
    |} ,
    <Bloomsbury> {|
      rdfs:qualifiedBy <#ukpbl> ;
      tk:source <bloomsbury.co.uk/datastream> ;
      tk:timestampMills 1714153405
    |} .

<#uspbl> a :PublicationEvent ;
  :location <NewYork> ;

<#ukpbl> a :PublicationEvent ;
  :location <London> ;

Bridging Different Designs

The Book and its Publication

Note that with a model like:

:publication owl:propertyChainAxiom (
        [ owl:inverseOf rdf:subject ]
        [ rdfs:subPropertyOf rdfs:qualifiedBy ; rdf:range :PublicationEvent ]
    ) .
:publisher owl:propertyChainAxiom (:publication :agent) .
:datePublished owl:propertyChainAxiom (:publication :startDate) .

The above can be connected to a fully explicit PublicationEvent event design, where the shorthands would have been explicily stated:

<#uspbl>
  :agent <HarperCollins>  # implies :publisher for <book>
  :startDate 2023 . # implies :datePublished for <book>

<#ukpbl>
  :agent <Bloomsbury> ;  # implies :publisher for <book>
  :startDate 2023 . # implies :datePublished for <book>

This implies that there are simpler :publisher and :datePublished shorthand statements, and tokens thereof, "in the middle" of the book and the publication event (the event being their "truth-maker").

Thus, by relying on the entailed interpretaion of a token, it is possible to model vocabularies to facilitate integration of, for instance, token-designs and N-ary designs.

The Model and its Part Specifications

Given the above LPG example, by declaring this:

_:partSpecRel rdfs:subPropertyOf rdf:reifies;
    rdfs:domain :PartSpecification ;
    rdfs:range [ a owl:Restriction ;
            owl:onProperty rdf:predicate ;
            owl:hasValue :hasPart ] .

:partSpecification [
    owl:inverseOf owl:propertyChainAxiom (_:partSpecRel rdf:subject) ] .

:component owl:propertyChainAxiom (_:partSpecRel rdf:object) .

A path is formed from this:

:model_z :partSpecification [ :component :wheel ;
        :quantity 4 ;
        :isPurchasedParts true ] .

Encoding in RDF 1.1

(For "unstarring".)

To represent this in RDF 1.1 systems, a triple object of an rdf:tokenOf is encoded as an rdf:Triple literal (encoded as one triple in the N-triples format, without trailing dot). That MUST either be consistently skolemized alongside the rest of the triples of the RDF document (or entire triple stream) or expression of the three entailed reification constituents must accompany the rdf:tokenOf triple. Then the connection between any blank nodes in the token literal is deducible from position (if the data is well-formed).

In RDF 1.2:

_:x rdf:tokenOf <<' :s :p :o '>> .

In RDF 1.1:

_:x rdf:tokenOf "<http://example.org/s> <http://example.org/p> <http://example.org/o>"^^rdf:Triple .
_:x rdf:subject :s .
_:x rdf:predicate :p .
_:x rdf:object :o .
Clone this wiki locally