-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor issues - but nothing's broken #24
Comments
@ziadloo Thanks for these fixes. Also in this implementation GRU was used and if I were to use LSTM do I need to make any changes? I mean in |
@ziadloo , Thanks for pointing these out. I'll have a go through and make the necessary changes. @John-8704 , This is something I will soon working on. Yes, since LSTM returns two states, it might not work properly. |
@John-8704 I am currently looking at using AttentionLayer for LSTMs. Can you post the error you're getting? Technically, it should work out of the box. Because, you should be using the output of the LSTM layer (which will have only a single state) and not the state of the LSTM.
|
This is the error that I've been getting when I try to use Bi-Directional LSTM's inplace of GRU's. Were you able to get lstm's working with the attention layer?
|
@John-8704 Can you also post the full model you got in place? |
@thushv89 This is the link to the notebook I'm working on. Also i'm the dataset is english-french sentences which also I've uploaded here. https://github.com/John-8704/code A small note, In my notebook I initialized fake states like this same as your code, It's exactly same as attention.py from your repo. "I think the error might be related to this but i'm not sure."
If I instead initialize fake states like the above suggestion by ziadloo,
then I'm getting the error like the one above which I already shared in my previous comment. |
@thushv89 Any update on why the error might be occurring? I couldn't figure out why I'm getting that error. Any temp fix? |
@John-8704 , I had a look at your code. The problem is that you are using
PS: If you can request this as a feature, I can start working on it at some point. :) |
@thushv89 Thanks for the help. Also which is the correct output shape we should use? Original:
Suggested:
|
Regarding your question. I think you're right. |
Thanks for this amazing code. I was working on your implementation when I came across a bunch of issues in it. Since my changes might not be to your liking, I'm not going to make a PR (my implementation is a little bit different). But I thought I should report the issues I've faced.
Before I list the issues, you need to know that none of them are breaking the implementation, meaning that your code works perfectly just the way it is. But it's just that if someone wants to make a change to it (like me), they'll have a headache. And also, some of the issues are just wrong.
1. The way the fake states are initialized is unnecessary:
Just simply initialize the tensors with zeros:
2. In both your step functions you return the
state
like this:While this does not throw any errors (because the states are discarded), but this is actually wrong. You just have to return the
states
, like this:3. You already have a PR on this. I'm just going to mention it for the sake of completeness. Your output shape is this:
But it should be this:
Thanks again. I learned a lot from your code.
The text was updated successfully, but these errors were encountered: