You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to have a spell check model linter that looks for spelling issues in shape names, member names, documentation, and maybe any string. It might be hard to do any string, but it would be interesting to see if it's possible and/or worthwhile (i.e., it could lead to severe performance issues and too many false-positives). This linter should have a default dictionary that can be appended to using a custom newline separated string that contains words. Custom words are a hard requirement since most models use domain specific terminology that isn't feasible to capture in the default list of words. The spell checker doesn't necessarily need to offer spelling suggestions, so that likely makes it easier to implement. Sentences would need to be broken down into individual words by tokenizing strings based on things like " ", "-", ",", ".", ";", ":", "_", etc.
The best dictionary I know of is https://github.com/dwyl/english-words, though the license is unclear, and we'll need to filter out bad words. The dictionary is around 4 MB, so we'll need to make sure we don't have to load the file repeatedly or store it in memory multiple times.
The text was updated successfully, but these errors were encountered:
This is remarkably harder than you'd think. I've yet to find a list that's able to filter out everything just in the listed repo, and even applying stemming techniques only gets you so far.
It would be nice to have a spell check model linter that looks for spelling issues in shape names, member names, documentation, and maybe any string. It might be hard to do any string, but it would be interesting to see if it's possible and/or worthwhile (i.e., it could lead to severe performance issues and too many false-positives). This linter should have a default dictionary that can be appended to using a custom newline separated string that contains words. Custom words are a hard requirement since most models use domain specific terminology that isn't feasible to capture in the default list of words. The spell checker doesn't necessarily need to offer spelling suggestions, so that likely makes it easier to implement. Sentences would need to be broken down into individual words by tokenizing strings based on things like " ", "-", ",", ".", ";", ":", "_", etc.
The best dictionary I know of is https://github.com/dwyl/english-words, though the license is unclear, and we'll need to filter out bad words. The dictionary is around 4 MB, so we'll need to make sure we don't have to load the file repeatedly or store it in memory multiple times.
The text was updated successfully, but these errors were encountered: