-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transposed data for supervised learning #130
Comments
I'm torn on this one.
What would be involved in changing to a scikit-learn style dataset, that
would allow for support of the Vlad4 style grammars?
Would we have to use loc etc instead of simply x[0], if its a small change
we can document it in the readme.
I suppose the bigger question is what would this proposed change gain us
over what we currently have.
Just some thoughts to get the discussion rolling
Dave
…On Wed, 2 Jun 2021 at 17:34, James McDermott ***@***.***> wrote:
In #129 <#129> we are discussing
the X dataset being transposed by PonyGE (relative to the Scikit-Learn
convention).
I see that we do indeed transpose the data here
https://github.com/PonyGE/PonyGE2/blob/2e0806f5ad42540c34b83eaf65d8301eec31cf29/src/utilities/fitness/get_data.py#L60
.
I think the motivation here is that we can write a grammar which will work
correctly whether processing a single row or a dataset. Eg in
Vladislavleva4 we have x[0]|x[1]|x[2]|x[3]|x[4]
https://github.com/PonyGE/PonyGE2/blob/2e0806f5ad42540c34b83eaf65d8301eec31cf29/grammars/supervised_learning/Vladislavleva4.bnf#L10.
With transposed data, this works.
But it is different from the convention used by Scikit-Learn, Tensorflow,
etc. Should we consider a change here?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#130>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHTHOXI6HQ6BDLT65M2L6DTQZMSPANCNFSM457C5CAQ>
.
|
We are just using Numpy, not Pandas, so no
Nothing! Well, just it would stick to the convention, so possibly easier for users writing custom code as in #129. |
Seems a small enough change to be fair and a simple note in the
documentation saying we moved from x[0] to x[:,0] style indexing should
cover it.
I’m easy either way
Dave
…On Fri 4 Jun 2021 at 11:01, James McDermott ***@***.***> wrote:
What would be involved in changing to a scikit-learn style dataset, that
would allow for support of the Vlad4 style grammars?
We are just using Numpy, not Pandas, so no loc. I think we would be
removing the transpose and changing the grammars to say x[:, 0] etc. And
if someone wanted to run the function on a single row of data x, they'd
have to reshape it with x.reshape((1, len(x)) or similar.
would this proposed change gain us over what we currently have
Nothing! Well, just it would stick to the convention, so possibly easier
for users writing custom code as in #129
<#129>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#130 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHTHOVHMUXK3HYE63JV7ATTRCQA5ANCNFSM457C5CAQ>
.
|
I think we should go ahead with this. I think the sklearn standard would be good to align with, more generally (also eventually inheriting from |
…k as per discussion in PonyGE#130
In #129 we are discussing the X dataset being transposed by PonyGE (relative to the Scikit-Learn convention).
I see that we do indeed transpose the data here
PonyGE2/src/utilities/fitness/get_data.py
Line 60 in 2e0806f
I think the motivation here is that we can write a grammar which will work correctly whether processing a single row or a dataset. Eg in
Vladislavleva4
we havex[0]|x[1]|x[2]|x[3]|x[4]
PonyGE2/grammars/supervised_learning/Vladislavleva4.bnf
Line 10 in 2e0806f
But it is different from the convention used by Scikit-Learn, Tensorflow, etc. Should we consider a change here?
The text was updated successfully, but these errors were encountered: