Transposed data for supervised learning #130

jmmcd · 2021-06-02T16:34:32Z

In #129 we are discussing the X dataset being transposed by PonyGE (relative to the Scikit-Learn convention).

I see that we do indeed transpose the data here

PonyGE2/src/utilities/fitness/get_data.py

Line 60 in 2e0806f

train_X = train_Xy[:, :-1].transpose() # all columns but last

.

I think the motivation here is that we can write a grammar which will work correctly whether processing a single row or a dataset. Eg in Vladislavleva4 we have x[0]|x[1]|x[2]|x[3]|x[4]

PonyGE2/grammars/supervised_learning/Vladislavleva4.bnf

Line 10 in 2e0806f

x[0]|x[1]|x[2]|x[3]|x[4]|

. With transposed data, this works.

But it is different from the convention used by Scikit-Learn, Tensorflow, etc. Should we consider a change here?

The text was updated successfully, but these errors were encountered:

dvpfagan · 2021-06-04T09:50:19Z

I'm torn on this one. What would be involved in changing to a scikit-learn style dataset, that would allow for support of the Vlad4 style grammars? Would we have to use loc etc instead of simply x[0], if its a small change we can document it in the readme. I suppose the bigger question is what would this proposed change gain us over what we currently have. Just some thoughts to get the discussion rolling Dave

…

On Wed, 2 Jun 2021 at 17:34, James McDermott ***@***.***> wrote: In #129 <#129> we are discussing the X dataset being transposed by PonyGE (relative to the Scikit-Learn convention). I see that we do indeed transpose the data here https://github.com/PonyGE/PonyGE2/blob/2e0806f5ad42540c34b83eaf65d8301eec31cf29/src/utilities/fitness/get_data.py#L60 . I think the motivation here is that we can write a grammar which will work correctly whether processing a single row or a dataset. Eg in Vladislavleva4 we have x[0]|x[1]|x[2]|x[3]|x[4] https://github.com/PonyGE/PonyGE2/blob/2e0806f5ad42540c34b83eaf65d8301eec31cf29/grammars/supervised_learning/Vladislavleva4.bnf#L10. With transposed data, this works. But it is different from the convention used by Scikit-Learn, Tensorflow, etc. Should we consider a change here? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#130>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHTHOXI6HQ6BDLT65M2L6DTQZMSPANCNFSM457C5CAQ> .

jmmcd · 2021-06-04T10:01:36Z

What would be involved in changing to a scikit-learn style dataset, that
would allow for support of the Vlad4 style grammars?

We are just using Numpy, not Pandas, so no loc. I think we would be removing the transpose and changing the grammars to say x[:, 0] etc. And if someone wanted to run the function on a single row of data x, they'd have to reshape it with x.reshape((1, len(x)) or similar.

would this proposed change gain us over what we currently have

Nothing! Well, just it would stick to the convention, so possibly easier for users writing custom code as in #129.

dvpfagan · 2021-06-04T10:49:12Z

Seems a small enough change to be fair and a simple note in the documentation saying we moved from x[0] to x[:,0] style indexing should cover it. I’m easy either way Dave

…

On Fri 4 Jun 2021 at 11:01, James McDermott ***@***.***> wrote: What would be involved in changing to a scikit-learn style dataset, that would allow for support of the Vlad4 style grammars? We are just using Numpy, not Pandas, so no loc. I think we would be removing the transpose and changing the grammars to say x[:, 0] etc. And if someone wanted to run the function on a single row of data x, they'd have to reshape it with x.reshape((1, len(x)) or similar. would this proposed change gain us over what we currently have Nothing! Well, just it would stick to the convention, so possibly easier for users writing custom code as in #129 <#129>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#130 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHTHOVHMUXK3HYE63JV7ATTRCQA5ANCNFSM457C5CAQ> .

jmmcd · 2021-10-17T11:46:12Z

I think we should go ahead with this. I think the sklearn standard would be good to align with, more generally (also eventually inheriting from RegressorMixin etc). I'm planning to use PonyGE for some symbolic regression problems in the next few weeks so I have some time to make the changes and mop up any problems.

…k as per discussion in PonyGE#130

jmmcd mentioned this issue Jun 2, 2021

Optimisation of constants for Python executable phenotypes #129

Open

jmmcd closed this as completed in e1b9197 Oct 17, 2021

Leoningel mentioned this issue Oct 26, 2021

Default example doesn't run #136

Closed

zahidirfan added a commit to zahidirfan/PonyGE2 that referenced this issue Oct 11, 2024

Changed Grammar file from x[,<varindx>] to x[<varindx>] for it to wor…

db5644a

…k as per discussion in PonyGE#130

jmmcd mentioned this issue Oct 12, 2024

Fix to get Test and Train Fitness for final solution of Mult-objective Optimization. Fixes #166 #167

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transposed data for supervised learning #130

Transposed data for supervised learning #130

jmmcd commented Jun 2, 2021

dvpfagan commented Jun 4, 2021 via email

jmmcd commented Jun 4, 2021

dvpfagan commented Jun 4, 2021 via email

jmmcd commented Oct 17, 2021

Transposed data for supervised learning #130

Transposed data for supervised learning #130

Comments

jmmcd commented Jun 2, 2021

dvpfagan commented Jun 4, 2021 via email

jmmcd commented Jun 4, 2021

dvpfagan commented Jun 4, 2021 via email

jmmcd commented Oct 17, 2021