Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-grams in dictionary #37

Closed
CarlaFernandez opened this issue Mar 19, 2019 · 5 comments
Closed

N-grams in dictionary #37

CarlaFernandez opened this issue Mar 19, 2019 · 5 comments

Comments

@CarlaFernandez
Copy link

Hi @mammothb, great job with SymspellPy.

I recently saw Issue 15 at Symspell's Github (wolfgarbe/SymSpell#15), and the last comment caught my attention. Apparently Symspell suppports N-grams in the dictionary file, but I did a small test in SysmpellPy and I was not able to achieve the desired behavior. My approach was the following:

  1. I added on top of a custom frequency dictionary the following sequence:
    abc def ghi 116422658 (highest frequency in the dictionary)

  2. I obtained suggestions to the sentence: abc dff ghi, using both lookup and lookup_compound

  3. The returned corrections were based on single words (1-grams) I had previously defined un my dictionary and not on the newly inserted 3-gram: abc off ghi

I would like to know if there is any way to reproduce the desired behavior in SymspellPy, that is, obtaining a prediction based on the N-gram counts, or if there are any plans to add it as a feature in the near future.

Thanks for your time!

@mammothb
Copy link
Owner

@CarlaFernandez according to this comment (wolfgarbe/SymSpell#54 (comment)), the only way to add multi-word phrases is through create_dictionary_entry()

lookup() results for abc dff ghi:
Using only frequency_dictionary_en_82_765.txt: No result
Using frequency_dictionary_en_82_765.txt and create_dictionary_entry("abc def ghi", 116422658): abc def ghi, 1, 116422658

@zoltan-fedor
Copy link
Contributor

@mammothb
I don't think it works even with created_dictionary_entry()

>>> from symspellpy.symspellpy import SymSpell
>>> sym_spell = SymSpell()
>>> sym_spell.create_dictionary_entry(key="abcde ghijkl mnopqr", count=116422658)
True
>>> suggestions = sym_spell.lookup_compound("aabcde ghijkl mnopqr", max_edit_distance=2)
>>> for s in suggestions:
...     print(s.term)
... 
aabcde ghijkl mnopqr

@mammothb
Copy link
Owner

You need to use lookup to spell check using n-grams since lookup_compound automatically breaks up the input string into individual words.

@zoltan-fedor
Copy link
Contributor

Thanks.
Unfortunately I need lookup_compound as I have whole sentences and I need to fix the spaces between words too. Hmm, I will have to think about whether I could extend `lookup_compound' to handle n-grams too.

@shahanesanket
Copy link

Any update on being able to use the lookup_compound with n-grams?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants