Proposal: Markup for non-vernacular words #49

davidg-sil · 2023-11-01T07:27:55Z

[Moved here from old site]

While there is \tl that is for transliterated words intended to be pronounceable in the vernacular orthography.
I would like to propose that there also be a \ol for "other language", not written in the vernacular orthography. I briefly considered calling it \wf (word foreign), but my use-case assumption is that at least some readers know the language, and may not consider it as foreign, but it's not the vernacular language of the publication.
It might be in the majority language of the region, a trade language, an international language, or that of a neighbouring area or group.

Summary

Description

Other language (non-vernacalar) text, written in unaltered form, often one known and understood by at least a fraction of the target audience.

Notes

Other language text may be marked for a given language via an attribute, lang, which specifies the source language according to ISO639-1 (2 letter codes) or 639-3 (3 letter codes historically known as ethnologue codes).
Other language text is not transliterated into the vernacular orthography (c.f. \tl), it is instead given in a form that readers of the language find it easiest to understand.
If no language attribute is given, the other language may be assumed to be the national language. However, specifying the language is nevertheless commended.
If the scripture editor checks character inventories or sequences, other language text should either not be included in those, or should be considered separately. Thus other language text may contain letters not permitted in the main text, and should not trigger warnings about unacceptable characters, sequences or spelling (unless the preparation system has appropriate spelling dictionaries available).
Other language text may require an alternative font or presentation. The language attribute and paragraph style should give sufficient information to select the font.
If the typesetting system uses pattern-based hyphenation, other language text should not be hyphenated using patterns developed for another language, (avoiding unfortunate breaks)

Syntax

USFM \ol content \ol*
-or- \ol content |lang="code" \ol*
USX <char style="ol" lang="code"> content</char>

Style type

Character

Valid in

[Section] [Para] [Table] [List] [Footnotes]

Example

\f + \fr 1:1 \fk Circumcised \ft A sign of the Abrahamic covenant.
 Romanian:\+ol tăiat împrejur|lang="ro"\+ol*  \f*

The text was updated successfully, but these errors were encountered:

davidg-sil · 2023-11-01T07:34:35Z

Commenting on my own suggestion, I realise that changing the font or hyphenation based on something that comes after the text is very hard in at least PTXprint. I don't know about other typesetting engines.
Rather than being a character style, a ranged milestone would almost certainly be better.

Example:

\f + \fr 1:1 \fk Circumcised \ft A sign of the Abrahamic covenant.
 Romanian:\ol-s |lang="ro"\* tăiat împrejur \ol-e\* \f*

Also, a ranged milestone would allow the entirety of a majority language introduction to be marked up.

\ol-s|lang="en"\* 
\is Introduction to this translation
\ip ....
\ol-e\*

KentSpiel · 2024-02-19T19:16:26Z

Assuming we allow adding category markup to paragraph and character markers, this could be implemented simply by putting a category \cat ro\cat* on a Paragraph or a Character span. It would not be pretty in Paratext but could be useful in typesetting and other publishing processes.

\f + \fr 1:1 \fk Circumcised \ft A sign of the Abrahamic covenant.
 Romanian: \tl \cat ro\cat*tăiat împrejur\tl*\f*

Could we add category information to the Paratext Style sheets? For example in custom.sty:

\marker tl
\cat ro
\TextProperties publishable nonvernacular
\font Romanian Special

mhosken · 2024-02-19T20:21:21Z

The problem with this approach in a stylesheet is that you have, in effect, multiple records with the same key. That is a significant change for the tooling. It makes specifying the structure of stylesheets way more complicated. PTXprint gets around this using a structured Marker that is not valid USFM. See the technical manual for details. I agree that a category value should be constrained to the normal id characters of lowercase, digits, hyphen or underscore. And yes I can buy into the value being a space separated list of category values.

…

On Mon, 19 Feb 2024, 19:16 Kent Spielmann, ***@***.***> wrote: Assuming we allow adding category markup to paragraph and character markers, this could be implemented simply by putting a category \cat ro\cat* on a Paragraph or a Character span. It would not be pretty in Paratext but could be useful in typesetting and other publishing processes. Could we add category information to the Paratext Style sheets? \marker p \cat ro \TextProperties paragraph publishable nonvernacular \Italic — Reply to this email directly, view it on GitHub <#49 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLMO3MCJ5RYGGVMFEKDMILYUOQJPAVCNFSM6AAAAAA6Y35UWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJTGA2DKMZVGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mhosken · 2024-10-18T07:42:38Z

How about PTXprint (or whatever) processes a \tl fred|en\tl* to \ztl-s|en\*fred\ztle\*? Then we get the best of both worlds.

davidg-sil · 2024-10-18T08:24:31Z

I agree that using \cat <language>\cat* is a nice method for specifying interesting things about a section of data, including things like hyphenation, font, etc, and it could certainly be applied rather than a ranged milestone, if that were desired.

My feeling, however, is that it would be a mistake to conflate \tl (transliteration, using vernacular writing system) and my proposed \ol (other language recognisable by language community but not in the same writing/spelling system). Doing so puts an additional strain on checking tools, as the language attribute would need be parsed and decoded before decisions are made about spell-checking, character inventories, etc. Excluding all \ol content from a character inventory is a much easier task. If a tool wishes to check the spelling of \ol content against a language-specific dictionary, that of course remains possible, but I don't think every checking tool should be forced by the standard to understand the content so far.

mhosken · 2024-10-19T03:47:41Z

Good point. Perhaps we should us \wl in keeping with \wh and \wg. So we define \wl _text_|_lang_\wl*.

The problem is that \wh and \wg are defined in terms of individual words in a wordlist rather than simply text in another language. Could we reappropriate \wh and \wg to simply be text in another language, marked as such, to stop the word analyser trying to allocate the text into the wrong place. Given the word analyser can break strings into words, there is no reason that \wh and \wg (and so \wl) need mark individual words separately. Either that or I am misinterpretting the standard.

It would help to have some examples in the docs.

mhosken · 2024-10-19T05:26:34Z

A quick look through some projects (what's in the DBL) \wg and \wh are rarely if ever used. Might we then deprecate it in favour of \wl text|grc\wl* or at least make them synonyms. and \wh = \wl text|hbo\wl*

mhosken added this to the 3.2 milestone Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Markup for non-vernacular words #49

Proposal: Markup for non-vernacular words #49

davidg-sil commented Nov 1, 2023 •

edited

Loading

davidg-sil commented Nov 1, 2023 •

edited

Loading

KentSpiel commented Feb 19, 2024 •

edited

Loading

mhosken commented Feb 19, 2024 via email

mhosken commented Oct 18, 2024

davidg-sil commented Oct 18, 2024

mhosken commented Oct 19, 2024

mhosken commented Oct 19, 2024

Proposal: Markup for non-vernacular words #49

Proposal: Markup for non-vernacular words #49

Comments

davidg-sil commented Nov 1, 2023 • edited Loading

Summary

Description

Notes

Syntax

Style type

Valid in

Example

davidg-sil commented Nov 1, 2023 • edited Loading

Example:

KentSpiel commented Feb 19, 2024 • edited Loading

mhosken commented Feb 19, 2024 via email

mhosken commented Oct 18, 2024

davidg-sil commented Oct 18, 2024

mhosken commented Oct 19, 2024

mhosken commented Oct 19, 2024

davidg-sil commented Nov 1, 2023 •

edited

Loading

davidg-sil commented Nov 1, 2023 •

edited

Loading

KentSpiel commented Feb 19, 2024 •

edited

Loading