-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix USJ schema and the way altnumber and pub number are handled #66
base: main
Are you sure you want to change the base?
Conversation
…g 'marker' as required in schema
@mhosken While working on this, I couldn't find the code portion corresponding to this function |
In response to the question of missing code in usjproc.py: guilty as charged. Sorry that got dropped accidentally. Do you want me to add it back in or do you want to do it? I would also refactor usx2usj to use usjproc rather than repeating code. In fact I would suggest refactoring the use case for usx2usj to use usfconv (which does any serialization to any other serialization) and do away with usx2usj completely. Looking at this PR, I would suggest that this is not a good way to go regarding \vp and \va. Yes \vp is ambiguous in that it can occur as a way of tagging the published form of a verse and also it may occur as a simple character style. I would suggest that USX is stronger here and keep the information as attributes of the verse. This doesn't preclude also having character runs of type vp. Another reason for not wanting to do this is that the simpler you can keep the mapping between USJ and USX, the better. Every special case is more expensive than a few lines of code, you have to document it and every implementation has to track that special case. It's why I work so hard to keep special cases out of the USFM parser/generator and keep it all in the grammar file. If you still feel strongly that you do want to follow USFM here, you also need to write the corresponding code to parse the sequence in USJ back into attributes in the USX data model. |
Our motivation for treating |
I think you have a special case whichever way you approach it. The advantage of keeping the attributes is that you are closer to the content model and the 'other' case is also a normal case (just another character style). I.e. the model and conversion is simpler. If you go with the USFM model for these, you have the same pain that the USFM processing has of explicitly handling these during conversion. I don't see a value in users of USJ having a single way to handle vp whether it is being a published verse or merely a character style. The two contexts are dissimilar enough to warrant separate handling. (Why do we allow vp as a character style anyway?) |
Because it is impossible to typeset many ecumenical Bibles without it. |
Preface to Sirach, NFC:
There are 35 "verses" before chapter 1. How would you like to do this without a vp character style? Or, alternatively, are you going to hold the 1.0 spec pending a discussion with the Vatican? Below, printed examples of French Bible Society NFC and TOL (the official French catholic translation). |
Thanks for the examples. Don't worry there are no plans to do away with the
char style. I was merely sharing my ignorance. BTW even if we did decide to
do away with the vp char style WHICH WE ARE NOT, we would keep it
supported, if deprecated, until it really isn't around any more. IOW, Don't
Panic.
…On Thu, 28 Mar 2024, 14:52 Mark Howe, ***@***.***> wrote:
Preface to Sirach, NFC:
\p \vp (1)\vp* Les livres de la \w Loi\w* et des \w Prophètes|Prophète,
prophétesse, prophétie, prophétiser\w* nous transmettent de nombreuses
grandes leçons, \vp (2)\vp* de même que les autres Écrits qui les suivent
There are 35 "verses" before chapter 1. How would you like to do this
without a vp character style? Or, alternatively, are you going to hold the
1.0 spec pending a discussion with the Vatican?
—
Reply to this email directly, view it on GitHub
<#66 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLMO3O2QF6FSXRJLESO7ALY2QOB7AVCNFSM6AAAAABD3MN632VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRVGQZDCNRZGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sorry, cross-posted comment. I'm not panicking because, one way or another, everyone will keep doing the right thing with Bibles. The only risk is that they ignore any standard that makes that harder, or that they prefer any de facto standard that makes that easier. If, today, you proposed to a room full of technicians some new standard with two completely different ways to represent exactly the same thing, the response would be somewhere between laughter and derision. That's precisely what USX does in this case. It's just one example of decisions with USX 3.0 in particular that, starting from scratch, would look like an obfuscated code joke. I get that it's hard to roll back those decisions for USX. But insisting on backwards compatibility with stupid, for eternity is... not guaranteed to drive adoption. |
USX and USJ are both serializations of a single data model. If we need to change this in USJ, we should change it in USX at the same time. I do mot yet have a strong feeling whether we should make this change, but I feel strongly that we should either make the change in both USX and USJ or not make the change. |
How does the TOB currently do this? Was the TOB written in Paratext? Was it published via USX? What does the markup look like in USFM / USX? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes any sense to have USX and USJ use different content models. USX is the underlying data model. It may make sense for USFM syntax to be reflected differently in both USX and USJ. That would probably require a different set of changes.
If I am understanding correctly, the concern is about the two different uses of \vp in USFM. I will call the first, the parameter usage and it is modeled in USX by a parameter. The other use I will call the character style use and it is modeled in USX by a element with a @Style="vp". What is immediately noticeable is that in USX, the two uses are clearly distinguished, with one using an attribute and one a styled element. Since this is the core data model for the standard, we start from that position and consider how each of these are serialised in the various formats. In USX the two uses are simple and clear: @pubnumber (even though it doesn't have to be a number) and . In USJ we are recommending that the same model be used. USFM, on the other hand has difficulties because we don't want to add attributes to \v and \c. Instead we use magic character styles in a fixed position to the \v and \c to serialise the attribute. If we were starting afresh on USFM, we would do something different, but that magic character style is labelled \vp. If we decided that we really didn't want to follow the USX model. Then we would need to change the USX model to use in both cases, to more directly represent the USFM representation. We don't think this is the best solution and recommend sticking with the existing USX model. And hence the request for USJ to directly represent the USX model rather than the USFM serialisation model. |
I think it's ambitious (putting things politely) to call this issue "the USX model". It's an accident of XML syntax and of the Paratext internal processing model, before USX became a standard, a history which the committee claims to have put behind it. I'm not seeing the two uses. In one case you are overriding the underlying versification and in the other case you are too. If I need to I can find you examples where both these forms happen in the same paragraph and for the same reason. Last time we went around this, the conversation ended with use of vp to reorder partial verses in Zechariah, and with the committee's answer that Bible scholars needed to change their translation to fit the committee's markup. I still think that's not how things are supposed to work. vp is used in all sorts of ways in huge numbers of documents. You can't retrofit constrained semantics to the world's existing documents and expect those documents to still work as they did before. This is epistemology meets Tenet the film.
That was my own copy of TOB which I believe to be the most recent tradition. It certainly exists in Paratext, I'm not sure if it was translated or originated that way but given UBS involvement I would think that it was at least translated that way. I don't think it's in DBL so it probably hasn't been published in USX. I don't have access to the markup, isn't there someone from UBS on the committee? |
From French NFC (UBS):
@mhosken @jonathanrobie What different use cases are you seeing between
and
? What deep semantics am I missing here? In the first case we're printing a number in brackets and in the second case we are too. In the first case we also make 30 or so verses a whole partial verse, which is a horrible kludge of which the Bible tech world should repent but, regardless, on what logical basis does that kludge need to be syntactically connected to one of 30 or so places where we want to add a number in brackets? |
Actually, we are creating a formal model of the language, something which did not exist previously. For the first time, we have:
That's something we care about, one of the main reasons we are doing this work in the first place. We can change the USX representation if that's the right thing to do. I don't think it makes sense for USJ and USX to be gratuitously different. I think we would do well to focus on what the internal model should be for this USFM markup and reflect our answer in both the internal model and serialization to USX and USJ. |
This is what I care about most: USFM can express both of these things, so we have to give them each an interpretation in our model. USX and USJ should each follow that interpretation. But I think there's a significant difference between:
The print formatting does not define the semantics of the underlying markup. I am not (yet) sure that I know whether anything needs changing in our model, but I would resist any change that was based on print formatting rather than well-defined semantics for each marker. I think you are proposing a change to our semantics. Can you be more clear about what that change is? |
Somewhat confused by @mvahowe's objections since I do not work in USX much or USJ at all. But the fact that \vp ...\vp* can be either a character style or an attribute on a verse does seem strange to me. Why can't vp always be an attribute on a object? This would obviate the need for a verse 1a to hang the first vp on.
and likewise for all the rest
That said (and I am just speaking from what seems logical to me) I would not put the Prologue in Chapter 1. I feel it should be either explicitly or implicitly in Chapter 0. Implicit
The USFM would be:
Explicit
The USFM would be:
|
Which two things? In terms of output and in terms of any user-comprehensible semantics I can think of, the two things are
USX has two ways to describe exactly the same thing. There's no extra expressivity that I can see. If you marked up v1 with a character style it would mean exactly the same thing. Also, does the schema stop me from doing exactly that? @KentSpiel Is (There's an equivalent potential issue with v0, but that "just works" since English speakers care about this. So, in Psalms, you can have canonical text, typically canonical titles, before v1. Several deuterocanonical books need that functionality, but for chapters.) |
No I don't think \c 0 is valid USFM. At least it would not work in Paratext, but that does not mean it couldn't be Valid. One would need to allow a chapter 0 in the project's versification. In other words it's a question of data integrity not structural integrity. |
We're way off the PR now, and I don't think there's an easy fix for the wider non-protestant versification issues. Chapter 0 probably should be "implied" since, like verse 0, no-one wants to print a zero in their Bible. The difference is that chapters contain verses, and many things break if you start typing verses before any chapter number. Off the top of my head you'd end up with all your ch0 content as part of mt1 or something. Really, my only point here is that the USX way of representing the same vp information in different ways looks like an error, probably is an error, and therefore shouldn't be propagated into new standards such as USJ. |
I agree that a PR is the wrong form for discussing this. Perhaps a shared doc would be better?
If there is an error that we need to fix, I think we need to fix it in both USX and USJ. A pull request that changes just USJ does not do that. But I think that starts with a clear shared understanding of the problem that needs to be fixed. I don't think we are there yet. I think a shared document would help:
If we agree there is a problem, we should find a solution to it. It may or may not be this one, but I think it should be the same for both USX and USJ. |
Have sent a new PR with changes other than the vp related ones. This PR could be kept as WIP until we make the required decision regarding it in USX( or the underlying data model). |
I have created a shared document to help us understand the use cases and requirements that Mark and Kavitha have mentioned: https://docs.google.com/document/d/1tBsihIxD8WBR6nFTmR9xPd98CepOPuK1U2b6leZgXiY/edit?usp=sharing Can we discuss it there? I'm not convinced I understand the issues yet. |
This PR includes
ca
,cp
,va
,vp
objects as separate elements in USJ like it is in USFM (866440e)python/lib/usjproc.py
as wellchar
andpara
type markers in USFM, though not so in USXrequired
in USJ schema