Support National Language Shift Tables #16

juliangut · 2019-05-21T12:12:50Z

Currently only supported Turkish, Spanish and Portuguese

Tests added only for charset validation. Message split should not be affected as National Language Shift Tables only affects charset and not number of bytes per char

Codesleuth · 2019-05-22T07:40:44Z

Awesome! Thanks @juliangut, please bear with me as I'm traveling and will have to review this when I get time.

juliangut · 2019-05-27T11:32:43Z

Hello @Codesleuth, have you had time to review this PR?

Codesleuth · 2019-05-28T08:24:14Z

I'm currently looking at it - will post soon.

juliangut · 2019-05-29T08:28:01Z

Hi @Codesleuth how did you find the PR? does it feel right?

Codesleuth

I'm going to hold off accepting this because of the reasons outlined in the comment. I'm thinking of an API in the future that will do something like this:

splitter.split('∞')

{
  "characterSet": "GSM",
  "shiftTable": "Portuguese language (Latin script)",
  "parts": [
    {
      "content": "∞",
      "length": 1,
      "bytes": 1
    }
  ],
  "bytes": 1,
  "length": 1,
  "remainingInPart": 159
}

And then in the case of mixed it would be:

splitter.split('∞Ø')

{
  "characterSet": "Unicode",
  "parts": [
    {
      "content": "∞Ø",
      "length": 2,
      "bytes": 4
    }
  ],
  "bytes": 4,
  "length": 2,
  "remainingInPart": 68
}

I think this will be a better API and won't give false positives for the mixed case being detected as GSM.

Codesleuth · 2019-05-27T16:33:32Z

lib/gsmvalidator.js

+}
+function validateCharacterWithShiftTable(character) {
+  var charCodes = GSM_charCodes.concat(GSM_TR_charCodes, GSM_ES_charCodes, GSM_PT_charCodes);


I can see how this appears to make sense if all you need to do is count all possible valid characters that exist in any shift table, but this creates an invalid situation where the whole text doesn't fit any common shift table, and should be detected as Unicode. Take the following two messages:

∞ valid in only Portuguese language (Latin script)

Ø valid in either Spanish language (Latin script) or the Basic Character Set

If each example was a full message, they would each be valid. However, if we take a message containing both characters:

∞Ø

As far as we know, this is an invalid message because there is no common shift table that supports both at the same time, so it should be detected as Unicode.

This library's character set auto-detection mechanism is pretty important, and we should think about how that can be preserved.

I've updated the PR, when validating a whole message with this method is not used anymore, review message below

juliangut · 2019-05-30T10:39:27Z

lib/gsmvalidator.js

+  var charCodes = [GSM_charCodes, GSM_TR_charCodes, GSM_ES_charCodes, GSM_PT_charCodes];
+  for (var i = 0; i < charCodes.length; i++) {
+    if (validateMessageInCharCodesList(message, charCodes[i]))


Validating against each shift table char-table independently solves the problem of mixed shift table characters combined within the same message

juliangut · 2019-05-30T10:42:34Z

Hi @Codesleuth I can't argue against a new unified API, that would be great. About the messages with chars of mixed shitf tables you are right, I've just updated the code and tests accordingly, please review my comments

Codesleuth · 2019-06-14T07:26:12Z

I'm so sorry for the delay! I will be working on this repo this weekend, which will give me time to review and try out this PR properly. This PR is missing functional tests that cover the options and shift table output, but I can add them as I work on it.

support National Language Shift Tables

4d05076

juliangut mentioned this pull request May 21, 2019

Add support for GSM Spanish language (Latin script) shift table #12

Open

Codesleuth reviewed May 29, 2019

View reviewed changes

correctly validate mixed shift tables chars in message

0a5f4a2

juliangut commented May 30, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support National Language Shift Tables #16

Support National Language Shift Tables #16

juliangut commented May 21, 2019

Codesleuth commented May 22, 2019

juliangut commented May 27, 2019

Codesleuth commented May 28, 2019

juliangut commented May 29, 2019

Codesleuth left a comment

Codesleuth May 27, 2019

juliangut May 30, 2019

juliangut May 30, 2019

juliangut commented May 30, 2019

Codesleuth commented Jun 14, 2019

Support National Language Shift Tables #16

Are you sure you want to change the base?

Support National Language Shift Tables #16

Conversation

juliangut commented May 21, 2019

Codesleuth commented May 22, 2019

juliangut commented May 27, 2019

Codesleuth commented May 28, 2019

juliangut commented May 29, 2019

Codesleuth left a comment

Choose a reason for hiding this comment

Codesleuth May 27, 2019

Choose a reason for hiding this comment

juliangut May 30, 2019

Choose a reason for hiding this comment

juliangut May 30, 2019

Choose a reason for hiding this comment

juliangut commented May 30, 2019

Codesleuth commented Jun 14, 2019