-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WR/ARIB] Character Sets #544
Comments
My understanding of this comment is that ARIB is requesting that the characters listed in IMSC appendix B. Common Character Sets, Table 2, are extended by adding a row for language code "ja" as per the table in the first comment in the issue, and that the two references are added as source data. Since we currently do not have a table for the "ja" language code, and ARIB-TT appears to be a good source of authority for captioning characters in Japan, this seems reasonable to me. |
I wonder if @himorin , @xfq , @r12a or @aphillips might have views about this issue. |
@nigelmegitt The point of IMSC appendix B, if I understand correctly, is to help implementations to identify minimum base character sets for font/rendering support in specific languages/countries, particularly for limited capacity devices/rendering platforms. In most cases, I am not sure that there is an underlying standard or character set behind the language-specific lists in the appendix. Instead the appendix provides information on minimal implementations. The situation with Japanese may be different, in that ARIB is a standard that defines such a character set. There has been cooperation in the past between ARIB and Unicode/ISO10646 and occasionally new characters have been encoded to support evolving ARIB character sets. It seems reasonable to document or provide a link to what this character set is for the purposes of IMSC developers. Unlike the existing elements in Table 2, however, this list is extensive and would be more difficult to incorporate using the same methodology. I don't recall whether Unicode maintains any documentation offhand either in the UCD or in CLDR of which characters are in ARIB. Referencing ARIB directly might be a better solution, as I'm not sure it makes sense to try to have IMSC track ARIB instead of just pulling it in by reference. The languages that otherwise appear in Appendix B do not otherwise have a ready reference. @r12a may have better recollection of the status of ARIB's character set vs. Unicode. |
@aphillips we actually asked Unicode for a specific subset of characters per locale within CLDR for subtitle and caption purposes, a long time ago. That request is currently tracked at https://unicode-org.atlassian.net/browse/CLDR-8915 (it used to be on a different Unicode tracker, which no longer seems to be operational). [Update to this comment:] |
@nigelmegitt "UNSCH" is "unscheduled". It's in limbo and I'll action myself to follow up with them. That said, you're kind of asking Unicode to define a "standard" (but which will more likely be a "recommendation" or "best practices"), where as ARIB already is a standard. I think table 2 is more like guidance for implementers. For this issue, I'd probably add text just above Table 2 along the lines of:
|
Ah, thanks for that @aphillips . ARIB only defines the characters for Japanese if I understand correctly, whereas the request to CLDR was to define them for every language. We also did offer to contribute to the work for doing so, I believe. Your proposal for Table 2 works okay; a tweak might be to add a "ja" row and put the text in the second column of that table. |
The Timed Text Working Group just discussed
The full IRC log of that discussion<nigel> Topic: [WR/ARIB] Character Sets #544<nigel> github: https://github.com//issues/544 <nigel> Nigel: I think the first question to ask is if this is normative/substantive. <nigel> Pierre: We should try to avoid making substantial changes this far into the process, but <nigel> .. we could formally because it is only informative. <nigel> Pierre: That section, regardless of the normative language around it, is meant to inform <cyril> " this section defines common character sets that authors are encouraged to use." <nigel> .. implementations. You could conclude that it affects implementations. <nigel> Cyril: "encouraged to use" <nigel> Pierre: And the W3C Process definition. <nigel> -> https://www.w3.org/2019/Process-20190301/#correction-classes Process 6.2.5 Classes of Changes <nigel> Pierre: Section 8.2 says a document "SHOULD be authored using characters from" the common character sets. <nigel> Cyril: There's a relationship between the reference fonts and the common character sets, right? <nigel> Pierre: Also <nigel> .. I think we should deal with these ARIB comments in the next version of IMSC otherwise <nigel> .. we may make mistakes. <nigel> Nigel: Back to Cyril's point, there does seem to be a substantive relationship between <nigel> .. reference fonts and the common character sets and the §9.3 text on rendering rules. <nigel> .. So it looks as though changing the common characters changes the code points in the <nigel> .. reference fonts and therefore the rendering rules. <nigel> .. (sorry that 9.3 is assuming the introduction is added, otherwise it's 8.3) <nigel> Nigel: My conclusion is we cannot make this change now but should add it to vNext <nigel> .. with appropriate care about reference fonts, and checking that the code points are <nigel> .. indeed all available. <nigel> .. Any other points to add before I summarise? <nigel> Pierre: I think that the idea of converging ARIB-TTML and IMSC is really a great goal, <nigel> .. and we should take the time to do it in collaboration with ARIB. I see that as a pretty <nigel> .. extensive but worthwhile effort. <nigel> SUMMARY: TTWG would like to adopt this change in a future version of IMSC. |
Per: w3c/ttwg#116
Comment 2
Collection 286*: Japanese Non Ideographic Extension
Collection 371*: JIS2004 Ideographics Extension
(Fullwidth ASCII variants) U+FF01 – U+FF5E
(Fullwidth Symbol variants) U+FFE3, U+FFE5
(Halfwidth Katakana variants) U+FF65 – U+FF9F
(Halfwidth CJK punctuation) U+FF61 – U+FF64
(Additional ideographs and symbols defined in Table 5-2 in Vol.1, Part 2 of ARIB STD-B62)
*: These collections are defined in Annex A to ISO/IEC 10646:2017
The text was updated successfully, but these errors were encountered: