[WR/ARIB] Character Sets #544

himorin · 2020-05-08T10:01:04Z

Primary language subtag	Characters
Ja	Collection 285: Basic Japanese Collection 286: Japanese Non Ideographic Extension Collection 371: JIS2004 Ideographics Extension (Fullwidth ASCII variants) U+FF01 – U+FF5E (Fullwidth Symbol variants) U+FFE3, U+FFE5 (Halfwidth Katakana variants) U+FF65 – U+FF9F (Halfwidth CJK punctuation) U+FF61 – U+FF64 (Additional ideographs and symbols defined in Table 5-2 in Vol.1, Part 2 of ARIB STD-B62) : These collections are defined in Annex A to ISO/IEC 10646:2017

Editorial note: ISO/IEC 10646 and ARIB STD-B62 should be added as references listed in Section K.
ISO/IEC 10646:2017: https://standards.iso.org/ittf/PubliclyAvailableStandards/c069119_ISO_IEC_10646_2017.zip
ARIB STD-B62: https://arib.or.jp/english/std_tr/broadcasting/std-b62.html

nigelmegitt · 2020-05-19T16:33:25Z

My understanding of this comment is that ARIB is requesting that the characters listed in IMSC appendix B. Common Character Sets, Table 2, are extended by adding a row for language code "ja" as per the table in the first comment in the issue, and that the two references are added as source data.

Since we currently do not have a table for the "ja" language code, and ARIB-TT appears to be a good source of authority for captioning characters in Japan, this seems reasonable to me.

nigelmegitt · 2020-05-19T16:37:00Z

I wonder if @himorin , @xfq , @r12a or @aphillips might have views about this issue.

aphillips · 2020-05-19T17:38:03Z

@nigelmegitt The point of IMSC appendix B, if I understand correctly, is to help implementations to identify minimum base character sets for font/rendering support in specific languages/countries, particularly for limited capacity devices/rendering platforms. In most cases, I am not sure that there is an underlying standard or character set behind the language-specific lists in the appendix. Instead the appendix provides information on minimal implementations. The situation with Japanese may be different, in that ARIB is a standard that defines such a character set.

There has been cooperation in the past between ARIB and Unicode/ISO10646 and occasionally new characters have been encoded to support evolving ARIB character sets. It seems reasonable to document or provide a link to what this character set is for the purposes of IMSC developers. Unlike the existing elements in Table 2, however, this list is extensive and would be more difficult to incorporate using the same methodology. I don't recall whether Unicode maintains any documentation offhand either in the UCD or in CLDR of which characters are in ARIB. Referencing ARIB directly might be a better solution, as I'm not sure it makes sense to try to have IMSC track ARIB instead of just pulling it in by reference. The languages that otherwise appear in Appendix B do not otherwise have a ready reference.

@r12a may have better recollection of the status of ARIB's character set vs. Unicode.

nigelmegitt · 2020-05-20T09:39:06Z

@aphillips we actually asked Unicode for a specific subset of characters per locale within CLDR for subtitle and caption purposes, a long time ago. That request is currently tracked at https://unicode-org.atlassian.net/browse/CLDR-8915 (it used to be on a different Unicode tracker, which no longer seems to be operational).

[Update to this comment:]
I just realised that you helped us with this, adding a comment to that CLDR tracking issue. I don't understand the status of it now though. There's a further comment that it has been moved to "UNSCH" which I cannot yet decode.

aphillips · 2020-05-20T16:00:59Z

@nigelmegitt "UNSCH" is "unscheduled". It's in limbo and I'll action myself to follow up with them.

That said, you're kind of asking Unicode to define a "standard" (but which will more likely be a "recommendation" or "best practices"), where as ARIB already is a standard. I think table 2 is more like guidance for implementers.

For this issue, I'd probably add text just above Table 2 along the lines of:

Table 2 specifies supplementary character sets that have proven useful in captioning and subtitling applications for a number of selected languages. Table 2 is non-exhaustive, and will be extended as needs arise. For Japanese, the standard ARIB STD-B62 defines a character set that is recommended as a reference.

nigelmegitt · 2020-05-20T16:08:59Z

Ah, thanks for that @aphillips . ARIB only defines the characters for Japanese if I understand correctly, whereas the request to CLDR was to define them for every language. We also did offer to contribute to the work for doing so, I believe.

Your proposal for Table 2 works okay; a tweak might be to add a "ja" row and put the text in the second column of that table.

css-meeting-bot · 2020-05-21T15:52:49Z

The Timed Text Working Group just discussed [WR/ARIB] Character Sets w3c/imsc#544, and agreed to the following:

SUMMARY: TTWG would like to adopt this change in a future version of IMSC.

The full IRC log of that discussion

<nigel> Topic: [WR/ARIB] Character Sets #544
<nigel> github: https://github.com//issues/544
<nigel> Nigel: I think the first question to ask is if this is normative/substantive.
<nigel> Pierre: We should try to avoid making substantial changes this far into the process, but
<nigel> .. we could formally because it is only informative.
<nigel> Pierre: That section, regardless of the normative language around it, is meant to inform
<cyril> " this section defines common character sets that authors are encouraged to use."
<nigel> .. implementations. You could conclude that it affects implementations.
<nigel> Cyril: "encouraged to use"
<nigel> Pierre: And the W3C Process definition.
<nigel> -> https://www.w3.org/2019/Process-20190301/#correction-classes Process 6.2.5 Classes of Changes
<nigel> Pierre: Section 8.2 says a document "SHOULD be authored using characters from" the common character sets.
<nigel> Cyril: There's a relationship between the reference fonts and the common character sets, right?
<nigel> Pierre: Also
<nigel> .. I think we should deal with these ARIB comments in the next version of IMSC otherwise
<nigel> .. we may make mistakes.
<nigel> Nigel: Back to Cyril's point, there does seem to be a substantive relationship between
<nigel> .. reference fonts and the common character sets and the §9.3 text on rendering rules.
<nigel> .. So it looks as though changing the common characters changes the code points in the
<nigel> .. reference fonts and therefore the rendering rules.
<nigel> .. (sorry that 9.3 is assuming the introduction is added, otherwise it's 8.3)
<nigel> Nigel: My conclusion is we cannot make this change now but should add it to vNext
<nigel> .. with appropriate care about reference fonts, and checking that the code points are
<nigel> .. indeed all available.
<nigel> .. Any other points to add before I summarise?
<nigel> Pierre: I think that the idea of converging ARIB-TTML and IMSC is really a great goal,
<nigel> .. and we should take the time to do it in collaboration with ARIB. I see that as a pretty
<nigel> .. extensive but worthwhile effort.
<nigel> SUMMARY: TTWG would like to adopt this change in a future version of IMSC.

himorin added Wide Review Comment imsc1.2 i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. labels May 8, 2020

w3cbot mentioned this issue May 10, 2020

[WR/ARIB] Character Sets w3c/i18n-activity#904

Open

This was referenced May 12, 2020

TTWG Meeting 2020-05-14 w3c/ttwg#114

Closed

TTWG Meeting 2020-05-21 w3c/ttwg#115

Closed

himorin mentioned this issue May 20, 2020

[WR/ARIB] Compatibility with ARIB-TTML / 2. Font handling #547

Open

nigelmegitt mentioned this issue May 26, 2020

TTWG Meeting 2020-05-28 w3c/ttwg#117

Closed

nigelmegitt added imscvNEXT and removed imsc1.2 labels May 29, 2020

nigelmegitt mentioned this issue Jun 2, 2020

TTWG Meeting 2020-06-04 w3c/ttwg#118

Closed

This was referenced Jun 9, 2020

TTWG Meeting 2020-06-11 w3c/ttwg#121

Closed

TTWG Meeting 2020-06-18 w3c/ttwg#122

Closed

nigelmegitt mentioned this issue Jun 23, 2020

TTWG Meeting 2020-06-25 w3c/ttwg#123

Closed

palemieux added this to the imsc1.3 milestone Sep 26, 2024

palemieux added the imsc1.3 label Sep 26, 2024

palemieux mentioned this issue Dec 27, 2024

Add JA character set #588

Open

palemieux added the pr open label Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WR/ARIB] Character Sets #544

[WR/ARIB] Character Sets #544

himorin commented May 8, 2020

nigelmegitt commented May 19, 2020

nigelmegitt commented May 19, 2020

aphillips commented May 19, 2020

nigelmegitt commented May 20, 2020 •

edited

Loading

aphillips commented May 20, 2020

nigelmegitt commented May 20, 2020

css-meeting-bot commented May 21, 2020

[WR/ARIB] Character Sets #544

[WR/ARIB] Character Sets #544

Comments

himorin commented May 8, 2020

nigelmegitt commented May 19, 2020

nigelmegitt commented May 19, 2020

aphillips commented May 19, 2020

nigelmegitt commented May 20, 2020 • edited Loading

aphillips commented May 20, 2020

nigelmegitt commented May 20, 2020

css-meeting-bot commented May 21, 2020

nigelmegitt commented May 20, 2020 •

edited

Loading