-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sklearn-style interface for regression/classification #47
Comments
The SKLearn-style interface is just one example of a bigger picture. We can only really run via command-line. A better approach might be to provide a for p in [0.001, 0.01]:
reg = ponyge.GERegressor(pmut=p)
reg.fit(X, y)
print(reg.score(X, y)) I think we can't loop over hyperparameters like this, at the moment, because the hyperparameter handling is mixed with the command-line parsing. (Right?) (If we could create a class as above, then the command-line interface would just be a script that handles CLI arguments and makes an appropriate call to this class.) The biggest obstacle to this is our parameter handling. Parameters are held in the parameters module (not a class). It was designed this way, I think, so that we don't have to pass zillions of parameters into every function. We just have access to the parameters module via import. (That code also feels a bit spaghetti-like as we import the module and then overwrite values in it.) This branch has a different approach: https://github.com/aadeshnpn/PonyGE2 It is unifying parameters, stats and tracker into a single class (if I understand right). Every function that needs it now takes an extra argument @aadeshnpn, are you still working with PonyGE2? If so, could you please show us how to start a run in your approach? |
I totally agree @jmmcd . We need to have wrappers to allow sklearn-style usage of our whole library that way it can be used by many other researchers with ease. I am still using the modified version of PonyGE2 in my research. If PonyGE2 maintainers want, I can send a merge request with the changes I have done to unify the scripts. Then we can go through the changes and decide which changes should be keep.
from ponyge.operators.initialisation import initialisation
from ponyge.fitness.evaluation import evaluate_fitness
from ponyge.operators.crossover import crossover
from ponyge.operators.mutation import mutation
from ponyge.operators.replacement import replacement
from ponyge.operators.selection import selection
from ponyge.algorithm.parameters import Parameters
parameter = Parameters()
parameter_list = ['--parameters', '..,regression.txt'] # path,filename (comma-separated)
parameter.params['RANDOM_SEED'] = 123
parameter.params['POPULATION_SIZE'] = 100
parameter.set_params(parameter_list)
individual = initialisation(parameter, 1)
individual = evaluate_fitness(individual, parameter)
for i in range(generations):
parents = selection(parameter, individuals)
cross_pop = crossover(parameter, parents)
new_pop = mutation(parameter, cross_pop)
new_pop = evaluate_fitness(new_pop, parameter)
individuals = replacement(parameter, new_pop, individuals)
individuals.sort(reverse=True) (Edited by jmmcd to fix the code a little.) |
Excellent, thanks. This overall approach makes sense to me. The code doesn't quite work as-is. I edited above to get started. Then I see some old bugs (eg sklearn.classification.metrics) and places where the new 336 # Set GENOME_OPERATIONS automatically for faster linear operations.
--> 337 if self.params['CROSSOVER'].representation == "linear" and \
338 self.params['MUTATION'].representation == "linear":
339 self.params['GENOME_OPERATIONS'] = True
AttributeError: 'function' object has no attribute 'representation' But maybe the most efficient use of time wouldn't be to track all these down and try to make a clean PR from a fork which has diverged. Instead, let's discuss the design we want and then if we decide to go ahead, implement it in a branch of the main repo (PonyGE/PonyGE2). One thing to discuss is: does it make sense to have Stats and Trackers as members of the Parameters class? The naming is confusing. Instead, maybe we should have a class State:
def __init__(self):
self.params = {} # etc
self.trackers = Trackers()
self.stats = Stats() Then every function would be like: def crossover(state, parents):
while len(cross_pop) < state.params['GENERATION_SIZE']: (and every function outside The point of it all is that every run has a (By the way, there is a fork at https://github.com/p-pereira/evoltree which is worth a look. It's not a fork via GitHub, just via copy-paste. It has a nice I can see quite a few things that could go wrong so I'll continue this brain-dump... In a previous issue #83, there was discussion of how difficult it would be to have a GE constructor with all the arguments. Counterpoint: We've put some work into creating these nice parameters files for different example problems. We don't have to throw that away. We can easily have a method There are several parts of the code I know nothing about, especially The |
Excellent points @jmmcd . I like the idea of using a divergent branch that has some of the issues fixed and then create a PR request from there. It would be really useful to discuss about overall framework design and design patterns that we think might be useful for the new PR request. I think I have nice understanding of scripts and parsing stuff for the library and if we combine our effort we can definitely have a nice object oriented interface with backward compatibility with the command-line interface. I think its efficient for us to schedule a zoom meeting, have a proper agenda about the changes we want. That should give us a good starting point. |
Thanks. Yes, a meeting would be helpful, especially if there are any other interested parties? I'm not quite ready to schedule it as things are busy here, but hopefully within a week or two. I am still re-learning some parts of the system. Just now I found the |
Thanks. Week or two should work for me as I am following a paper deadline. |
We should be able to provide a wrapper to allow this sklearn-style usage of our regression/classification:
The text was updated successfully, but these errors were encountered: