Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to turn whole sentence into singular? #134

Open
binarykitchen opened this issue Mar 10, 2014 · 7 comments
Open

How to turn whole sentence into singular? #134

binarykitchen opened this issue Mar 10, 2014 · 7 comments

Comments

@binarykitchen
Copy link

Hello again

I'd like to turn all words of a sentence into singular.

For example my dog has lots of flees should become [ 'my', 'dog', 'has', 'lots', 'of', 'flee' ]

Here the code:

    var tokenizer = new natural.WordTokenizer();
    var words = tokenizer.tokenize(sentence);
    var nounInflector = new natural.NounInflector();

    for (var index in words) {
        var word = words[index];

        words[index] = nounInflector.singularize(word);
    }

    console.log(words);

which outputs:

[ 'my', 'dog', 'ha', 'lots', 'of', 'flee' ]

Almost correct. But why ha? And not has?

@binarykitchen
Copy link
Author

PS: I encounter similar issues when I try to turn a whole sentence in past tense into present with the PresentVerbInflector.

@kkoch986
Copy link
Member

I think the problem is that has is a verb and you are using the noun inflector. You could probably use wordnet to get the POS (don't have our own tagger yet see #117).

The inflectors work off of a set of rules in most cases, so if you give the noun inflector a verb it will likely treat it is a noun and apply the rules (getting ha instead of `has)

@binarykitchen
Copy link
Author

Thanks @kkoch986

Hmmm.... what are wordnet and POS?

Is there a function which tells me if the word is a noun or a verb?

@kkoch986
Copy link
Member

Sorry, could have been more clear. POS stands for part-of-speech meaning is it a verb, noun, adjective etc...

Wordnet is a database of english words and it contains a lot of useful information about them see http://wordnet.princeton.edu/ and here for more on that.

Once you've configured natural to use wordnet (as per the second link above) you can get the parts of speech by doing something like this:

var wordnet = new natural.WordNet();
wordnet.lookup('has', function(results) {
    results.forEach(function(result) {
        console.log(result.pos);
    });
});

Let me know if that helps, I think a good POS tagger (something that takes a word and returns its part of speech) or a good sentence parser (something that takes a sentence and gives some information about what words make up what parts of the sentence structure) are important additions to natural. Hopefully both will be coming soon.

-Ken

@binarykitchen
Copy link
Author

Thanks @kkoch986 - I will give this a try but first, let me ask you a couple of questions:

  • It would be cool if the above lookup method also accepts an array of words, i.E. a whole sentence!
  • Also, it would be awesome if the returned results in the callback also come with the correct inflectors! (without the need to bloat up the code with if-verb-then-create-verb-inflector etc.)
  • Basically, my goal is to be able to translate a whole sentence into present, singular etc.

I need all that for a mad experiment. To interpret English sentences into sign language ;)

@kkoch986
Copy link
Member

@binarykitchen I love both the lookup feature enhancement and the "mad experiment" keep me posted on your progress and let me know if i can help in some way. I'll try to get to working on that lookup feature but its not quite at the top of my list right now. I would be more than happy to merge it in if you arrive at the solution before i do.

  • Ken

@binarykitchen
Copy link
Author

@kkoch986 Thanks Ken! Go ahead, there is no rush. Whenever you have enhanced the code, I will continue with my mad experiment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants