Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spell check linter #668

Open
mtdowling opened this issue Dec 10, 2020 · 2 comments
Open

Add spell check linter #668

mtdowling opened this issue Dec 10, 2020 · 2 comments
Labels
feature-request A feature should be added or improved.

Comments

@mtdowling
Copy link
Member

It would be nice to have a spell check model linter that looks for spelling issues in shape names, member names, documentation, and maybe any string. It might be hard to do any string, but it would be interesting to see if it's possible and/or worthwhile (i.e., it could lead to severe performance issues and too many false-positives). This linter should have a default dictionary that can be appended to using a custom newline separated string that contains words. Custom words are a hard requirement since most models use domain specific terminology that isn't feasible to capture in the default list of words. The spell checker doesn't necessarily need to offer spelling suggestions, so that likely makes it easier to implement. Sentences would need to be broken down into individual words by tokenizing strings based on things like " ", "-", ",", ".", ";", ":", "_", etc.

The best dictionary I know of is https://github.com/dwyl/english-words, though the license is unclear, and we'll need to filter out bad words. The dictionary is around 4 MB, so we'll need to make sure we don't have to load the file repeatedly or store it in memory multiple times.

@JordonPhillips
Copy link
Contributor

we'll need to filter out bad words

This is remarkably harder than you'd think. I've yet to find a list that's able to filter out everything just in the listed repo, and even applying stemming techniques only gets you so far.

@JordonPhillips JordonPhillips added the feature-request A feature should be added or improved. label Dec 28, 2020
@PatMyron
Copy link

PatMyron commented May 8, 2021

Ignoring regular expression patterns in addition to dictionaries of specific words is critical

@github-actions github-actions bot added the closing-soon This issue will automatically close in 7 days unless further comments are made. label Aug 19, 2023
@jvschneid jvschneid removed the closing-soon This issue will automatically close in 7 days unless further comments are made. label Aug 21, 2023
@smithy-lang smithy-lang deleted a comment from github-actions bot Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

4 participants