Proposal for JSON representing USFM/USX Contents #42
Replies: 7 comments 12 replies
-
Yeah, I think this might be one of the most sensible JSON representations of USFM/USX that I've seen. The simplicity (or lightness as you call it) is the key. Congrats. (I came to a similar conclusion with my internal Python format -- USFM seems well represented as list/array type objects which can then be sequentially processed. As soon as you start to use lots of dictionaries/maps, you get messed up by the overlapping hierarchies.) Didn't like the proposed name at first glance (felt it should have bible or scripture in the name), but on second thoughts, it clearly signifies its design philosophy. So a big thumbs up at first glance. P.S. I did two other things that you might possibly consider: 1/ I disambiguated cl fields which have two quite distinct uses in USFM (depending on where they occur--I gave one use case a modified marker), and 2/ I copied (not moved) chapter numbers into the place where you normally want to print them (giving them a different c# marker/name). If these two things are done when the JSON is formed, it would make sequential processing of your lists a lot more straight-forward later on. |
Beta Was this translation helpful? Give feedback.
-
I like the use of "tag:@Style". Very succinct. This will involve someone in categorising markers as to whether they can take marker content or not. It also means that any readers (or writers) will have to have such an awareness or 'hope for the best'. It may turn out to be easier to pay the byte cost and use "content": ["single value"] over "text": "single value". But that's for you guys to decide. The current USFM/USX grammar distinguishes the different use cases of things like \ca and \xt and can do the right thing with them. I suggest that there is no need to resolve this issue in a different way in JSON. If converting between different formats, if JSON uses a special way of handling these, then special code will need to be written. As to a name, I quite like USJ (Unified Scripture JSON). |
Beta Was this translation helpful? Give feedback.
-
A wasn't sure about attributes. With USX, the attributes are XML attributes of the element, so there is an easy way to get all of them. The JSON format currently requires you to filter out the some fields and what you have left are the attributes. Maybe an "attributes": {"attribute1": "value1", "attribute2": "value2" } approach would be clearer and easier to use. |
Beta Was this translation helpful? Give feedback.
-
Quick remark: idempotent format conversion should be considered, i.e. the ability of full round-trip (USFM -> USJson -> USFM) without loss of information and ideally semantically identical. |
Beta Was this translation helpful? Give feedback.
-
Absolutely. USFM, USX, and the JSON format should have the same data model. In the current draft format for USFM and USX, they share a common data model, which is based on the USX representation, and they are just different serializations of the same data model. The JSON format should do the same. |
Beta Was this translation helpful? Give feedback.
-
I think it's clear that there is a group of programmers who want to work with Scripture but do not know XML or USFM. As long as our formats are different serializations of the same data format, I think we can support them. I don't think that "the programming world in general" is storing structured documents as JSON. I don't think JSON will replace HTML, and I don't think of the JSON format as a replacement for USX. So I agree that this sentence implies things that might not be true. Regardless, if all three formats exist, with the same data model, we can see how this pans out. |
Beta Was this translation helpful? Give feedback.
-
We are using USJ in Platform.Bible (despite it not being released yet!) internally to represent Scripture. I'm working on the editor which takes in USJ and have found something I think is missing from the I've found that |
Beta Was this translation helpful? Give feedback.
-
A JSON representation for USFM/USX Contents
By Kavitha Raju, on behalf of the working group consisting of
Joel Mathew, Mark Howe and Kavitha Raju
Motivation
As USFM format, which is a text markup, and USX, which is XML, are being extensively used, the programming world in general is moving towards newer formats, with JSON (JavaScript Object Notation) being a prominent one among them. It becomes necessary to facilicate the representation of these content in such a format to enable
The Design Principles
The key principles we have tried to adhere to, in designing the JSON structure, are the following
style
attribute in USX) are together used to define the element type in the JSON.{"type" : "para:p"}
,{"type": "verse:v"}
, etc.text
field for those elements where nested elements(including character markers) are not possible.content
field is provided as an array.Special Treatments
Though in general we are proposing a direct JSON representation of the USX contents, the following exceptions have been made in that transfer.
ca
,cp
,va
andvp
are always represented as separate elements and not asaltnumber
andpubnumber
, becausethese may occur not just at the chapter/verse start but also in between their contents and then their representation is inconsistent in USX.
Samples
Samples of this Proposed JSON are available here
Open Questions
Beta Was this translation helpful? Give feedback.
All reactions