Example of bengali grapheme clusters out fo data #150

andjc · 2025-01-29T10:14:43Z

The current editors draft has the following text:

For example, the Bangla user-perceived character kshī ক্ষী is composed of four characters: U+0995 BENGALI LETTER KA + U+09CD BENGALI SIGN VIRAMA + U+09B7 BENGALI LETTER SSA + U+09C0 BENGALI VOWEL SIGN II.
Unicode splits these into two grapheme clusters, unless language-specific tailoring is applied. For more information, see our article Character encodings: Essential concepts.

This describes the behavior prior to Unicode 15.1. UAX29 was updated in the Unicode 15.1 release, adding an additional rule GB9c:

Do not break within certain combinations with Indic_Conjunct_Break (InCB)=Linker

For the example 'ক্ষী' , UAX29 revision 41 and earlier would result in two extended grapheme clusters ('ক্', 'ষী') while UAX29 revision 43 onwards results in a single extended grapheme cluster ('ক্ষী'). So behaviour is dependent on version of UAX29 (i.e. version of Unicode supported).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example of bengali grapheme clusters out fo data #150

Example of bengali grapheme clusters out fo data #150

andjc commented Jan 29, 2025

Example of bengali grapheme clusters out fo data #150

Example of bengali grapheme clusters out fo data #150

Comments

andjc commented Jan 29, 2025