Skip to content

Text Direction considerations

Gregg Kellogg edited this page Feb 9, 2023 · 3 revisions

According to the W3C Process Document:

All reports, publications, or other deliverables produced by the group for public consumption should follow best practices for internationalization and for accessibility to people with disabilities. Network access to W3C-controlled domains may be assumed.

One thing that wasn't given as much attention in earlier Working Groups (including RDF 1.1) was support for describing the direction that text should be presented. Although Unicode can encode some changes in text direction (see First-string property detection, it is not enough, so that an external mechanism needs to indicate the text direction. This is captured in https://github.com/w3c/rdf-concepts/issues/9 on RDF Concepts, where much of this information has been gathered. Of course, changes potentially will impact other specifications.

The Internationalization Best Practices for Spec Developers Note has a whole section devoted to Text direction, and this is a checklist item for the internationalization review all specs need to go through. When this was performed for JSON-LD 1.1, it came up that the underlying data model (RDF) had no way to indicate text direction, which was a shortcoming. There were a number of review items and proposals put forth (See RDF Literals and Base Directions by Ivan Herman and Pierre-Antoine Champin, for example and further discussion).

JSON-LD added support by introducing the @direction keyword added to a Literal Object. When serializing to and from an RDF Dataset, non-normative solutions were proposed using The i18n Namespace and The rdf:CompoundLiteral along with associated properties. The idea being that these would be interim solutions available before normative behavior could be described in a future version of RDF.

Potential solutions

Generally, the i18n namespace solution seems to have had the most uptake.

The i18n namespace

Summary: The i18n namespace <https://www.w3.org/ns/i18n#> is used as the basis for a datatype that includes either or both of the language tag and text direction, separated by an underscore (_). In JSON-LD, both values are normalized to lower-case.

Example for Arabic text (ar-EG) with right to left text direction (rtl):

[
  ex:title "HTML و CSS: تصميم و إنشاء مواقع الويب"^^i18n:ar-eg_rtl;
  ex:publisher "مكتبة"^^i18n:ar-eg_rtl
] .

Advantages:

  • Works in every existing RDF serialization format.

Disadvantages:

  • As it uses a datatype IRI, without normalization, otherwise equivalent literals may not be considered to be the same.
  • It does not actually create a language-tagged string, and SPARQL builtins such as LANG can't access this.

Also see the Pros and Cons discussion in RDF Literals and Base Directions,

The rdf:CompoundLiteral class

Summary: The [i18n namespace](rdf:CompoundLiteral) uses a blank node to contain the separate components of the "literal":

Example for Arabic text (ar-EG) with right to left text direction (rtl):

[
  ex:title [
    rdf:value "HTML و CSS: تصميم و إنشاء مواقع الويب",
    rdf:language "ar-eg",
    rdf:direction "rtl"
  ];
  ex:publisher [
    rdf:value "مكتبة",
    rdf:language "ar-eg",
    rdf:direction "rtl"
  ]
] .

Ednote: The group considered using a language-tagged string as the value of rdf:value, but I can't immediately see the rationale for rejecting this.

Advantages:

  • Works in every existing RDF serialization format.

Disadvantages:

  • Introduces a new blank node.
  • Also same dis-advantages as the i18n datatype solution.

Also see the Pros and Cons discussion in RDF Literals and Base Directions,

Extend langString

As expressed in RDF Literals and Base Directions Extend langString, the definition for rdf:langString could be extended to include both the language tag and the text direction by introducing a separator such as a carat (^).

Example for Arabic text (ar-EG) with right to left text direction (rtl):

[
  ex:title "HTML و CSS: تصميم و إنشاء مواقع الويب"@ar-EG^rtl;
  ex:publisher "مكتبة"@ar-EG^rtl
] .

Advantages:

  • Literal is treated as a language-tagged string with an extra "text direction" facet.
  • Minimal impact on RDF Semantics.

Disadvantages:

  • Requires changes to every RDF serialization format (which we're doing anyway).
  • Is not available to specifications which are not updated.
  • Depending on your view, it raises compatibility issues with RDF 1.0/1.1 in that systems that see this in RDF 1.2 data will either not see the text direction, or will fail to process the data. Although, a quick check on what this means may suggest otherwise:

    Backward compatibility is a design that is compatible with previous versions of itself. Forward compatibility is a design that is compatible with future versions of itself.

A related solution would update the grammars to allow both a language tag and a datatype, for specific datatype IRIs derived from rdf:langString.

[
  ex:title "HTML و CSS: تصميم و إنشاء مواقع الويب"@ar-EG^^rdf:langStringRtl;
  ex:publisher "مكتبة"@ar-EG^^rdf:langStringRtl
] .
Clone this wiki locally