Skip to content

Text Direction Proposal

Andy Seaborne edited this page Jul 6, 2023 · 8 revisions

Text direction

This is a proposal for an addition to the RDF data model. It provides for literals that have an indicator for the initial direction when text is displayed.

This addition only affects the initial direction of text.

For more detailed control of displayable content, there is rdf:HTML.

Background

W3C

Unicode Characters

Script

Language tags can also include a script component which also indicates the text direction. Explicit initial text direction, as proposed here, SHOULD override the script subtag. This enables RDF processors to operate without needing to have access to the language tag registry to interpret the script subtag if present.

See the section "Choosing a Language Tag".

Data Model

Add the following:

  • If a literal has a language tag, it MAY also have an initial text direction.

Such literals have datatype rdf:dirLangString (alternative: rdf:i18nString).

The value space of rdf:dirLangString is the set of 3-tuples (lexical form, language tag, direction).

Alternative

It would also be possible to extend rdf:langString to have an optional direction component and not introduce a datatype. The value space of rdf:langString is extended to have both 2-tuples (lexical form, language tag) and 3-tuples (lexical form, language tag, direction).

Syntax

Turtle, etc., SPARQL syntax

LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* ('--' [a-zA-Z0-9]+)?`

No whitespace is allowed before or after the --.

Alt: to highlight direction in the grammar,

LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* DIRECTION?`
DIRECTION ::= ('--' [a-zA-Z]+)

(update: the original proposal was ('--' [a-zA-Z0-9]+) which has a conflict in SPARQL.)

but this then has tokenizers recognizing out-of-place -- which may impact parser error messages.

The only valid strings for the direction in RDF 1.2 are "ltr" and "rtl" (lowercase). This is not in the grammar; it is related to the value of a literal.

Why "--"?

Often, the proposed syntax involved ^. "^" is already used in SPARQL and might appear after a language tag.

"--" avoids such problems, and does not involve any of the unused characters which might be helpful in the future.

Equality

Literal term equality: Two RDF terms are equal if they have the same language tag and same initial text direction.

RDF/XML

(Outline)

The attribute dir= applies to all contained elements (c.f. the behaviour of xml:base). It affects any xml:lang= in the use of the contained content.

SPARQL

Section "17.4.3 Functions on Strings" - Argument Compatibility Rules. Add:

  • The arguments have datatype rdf:dirTextString with identical language and text direction.
  • The first argument has datatype rdf:dirTextString, and the second argument has datatype xsd:string.

Functions

Function "abc"@ar "abc"@ar--rtl notes
LANG "ar" "ar" behaves like old text
LANGDIR "" "rtl" New function
DATATYPE rdf:langString rdf:dirLangString

STRLANG takes an optional third argument. STRLANG("text", "ar", "rtl").

LANG does not return the direction component. This helps existing code.

New function LANGDIR():

LANGDIR("abc"@ar--rtl) -> "rtl"
LANGDIR("abc"@ar) -> ""
LANGDIR("abc") -> ""
LANGDIR(1) -> ""
LANGDIR(`<uri>`) -> error

New function isLANG():

isLANG(?x) -> isLiteral() && ( DATATYPE(?x) = rdf:i18nString || DATATYPE(?x) = rdf:langString )

LANGMATCH - does apply to rdf:dirLangString; only works on the language tag, not direction; it ignores the direction.

(comparison of language tagged strings)

Notes

This proposal is minimal — there are possible additions, and these should be considered when the basic form for RDF and SPARQL is settled.

  1. Does not cover "xsd:string with direction"
    "abc"@--ltr

  2. Does not define whether rdf:langString and rdf:dirLangString are related, e.g., subclass (simple entailment does not have "subclass") or derived datatype (by restriction or by extension -- the decimal hierarchy is by restriction).

Clone this wiki locally