-
Notifications
You must be signed in to change notification settings - Fork 0
mfjoneill/PonyGE2_moving_point
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
------------ Introduction ------------ Grammatical evolution (GE) is a population-based evolutionary algorithm, where a formal BNF-style grammar is used in the genotype to phenotype mapping process. PonyGE2 is an implementation of GE in Python. It's intended as an advertisement and a starting-point for those new to GE, a reference for students and researchers, a rapid-prototyping medium for our own experiments, and as a Python workout. The original version of PonyGE (https://github.com/jmmcd/ponyge) was originally designed to be a small single-file implementation of GE. However, over time this has grown to a stage where a more formal structured approach was needed. This has led to the development of PonyGE2 (https://github.com/jmmcd/ponyge2), presented here. The PonyGE2 development team can be contacted at: James McDermott <[email protected]>, Erik Hemberg <[email protected]>, Michael Fenton <[email protected]>, David Fagan <[email protected]>. PonyGE2 is copyright (C) 2009-2016 ------------ Requirements ------------ PonyGE runs under Python 3.x. Using matplotlib, numpy, scipy, scikit-learn (sklearn), pandas All requirements can be satisfied with Anaconda. -------------- Running PonyGE -------------- We don't provide any setup script. You can run an example problem (the default is regression, see below) just by typing: $ cd src $ python ponyge.py This will run an example problem and generate a results folder. The folder contains several files showing the run's stats, producing graphs and documenting the parameters used, as well as a file containing the best individuals. For a more verbose command line experience run the following: $ cd src $ python ponyge.py --verbose Each line of the output corresponds to a generation in the evolution, and prints out all statistics on the current run (only if --verbose is specified). Upon completion of a run, the best individual is printed to the command line, along with summary statistics. There are a number of flags that can be used for passing values via the command-line. To see a full list of these just run the following: $ python ponyge.py --help ------------- About PonyGE2 ------------- #TODO: Fill out this section heftily. A full breakdown of the currently implemented elements in PonyGE2 is provided below. This includes a brief description of each individual component and how to activate them. PonyGE2 automatically parses the correct path for all operators, meaning you don't have to specify the full direct path but only the name of the desired operator, e.g. "--crossover subtree" instead of "--crossover operators.crossover.subtree". However, it is still possible to specify the full correct path if you so desire. Specifying the full direct path allows you to create new operators and place them wherever you like. - Initialisation -------------- There are two main ways to initialise a GE individual: by generating a genome, or by generating a subtree. Generation of a genome can only be done by creating a random genome string and as such individuals cannot be controlled. Subtree generation on the other hand can be forced to conform to specified limits, e.g. depth limits. This is implemented in ramped half-half initialisation (also called Sensible initialisation). It is also possible to initialise a population using randomly generated subtrees, which is similar in theory to random genome initialisation except there is far less bias than random genome initialisation as a result of the number of production choices in the grammar. - Genome - Random Activate with "--genome_init" - Subtree - Random Activate with "--initialisation random_init" - Ramped Half-Half Activate with "--initialisation rhh" - Selection --------- - Tournament Activate with "--selection tournament" - Truncation *** NOT FULLY IMPLEMENTED *** Activate with "--selection truncation" - Variation --------- - Crossover - Onepoint Activate with "--crossover onepoint" - Subtree Activate with "--crossover subtree" - Mutation - Int Flip Activate with "--mutation int_flip" - Subtree Activate with "--mutation subtree" - Evaluation ---------- - Replacement ----------- - Generational Activate with "--replacement generational" - Steady State *** NOT FULLY IMPLEMENTED *** Activate with "--replacement steady_state" Writing grammars ---------------- Grammars are written in Backus-Naur form, aka BNF. See the examples in src/grammars. Each rule is composed of a left-hand side (a single non-terminal), followed by the "goes-to" symbol ::=, followed by a list of productions separated by the "or" symbol |. Non-terminals are enclosed by angle brackets <>. For example: <a> ::= <b>c | d You can use an "or" symbol or angle bracket in a production. Escape it using a backslash: \|, \<, \>. You can use the "goes-to" symbol in a production without escaping it. Along with the fitness function, grammars are one of the most problem-specific components of the PonyGE2 algorithm. The performance of PonyGE2 can be vastly affected by the quality of the grammar used. A note on unit productions -------------------------- Traditionally GE would not consume a codon for unit productions. This was a design decision taken by O'Neill et al. In PonyGE2 unit productions consume codons. The logic being that it helps to do linear tree-style operations. Furthermore, the checks needed for unit productions during the running of the algorithm can add up to millions of checks that aren't needed if we just consume codons for unit productions. The original design decision on unit productions was also taken before the introduction of evolvable grammars whereby the arity of a unit production could change over time. In this case consuming codons will help to limit the ripple effect from that change in arity. This also replicates non coding regions of genome as seen in nature. In summary, the merits for not consuming a codon for unit productions are not clearly defined in the literature. The benefits in consuming codons are a reduction in computation and improved speed with linear tree style operations. Other benefits are an increase in non-coding regions in the chromosome (more in line with nature) that through evolution of the grammar may then express useful information. #FIXME Need to finalise a suite of problems for PonyGE2 ---------------- Example Problems ---------------- Three example problems are currently provided: - String-match - Regression - Classification A brief description is given below of each problem, along with the command-line arguments necessary to call each problem. It is not necessary to specify the desired grammar for each individual problem as PonyGE does this automatically based on the given inputs. String-match ------------ The grammar specifies words as lists of vowels and consonants. The aim is to match a target word. To use it, specify the following command-line arguments: "--problem string_match" "--target_string TYPE_TARGET_STRING" (e.g. golden, ponyge_rocks) Regression ---------- The grammar generates a symbolic function composed of standard mathematical operations and a set of variables. This function is then evaluated using a pre-defined set of inputs, given in the datasets folder. Each problem suite has a unique set of inputs. The aim is to minimise some error between the expected output of the function and the desired output specified in the datasets. This is the default problem for PonyGE. To use it, specify the following command-line arguments: "--problem regression" "--problem_suite PROBLEM_SUITE" (e.g. Keijzer6, Vladislavleva4) Classification -------------- #TODO Explain classificaiton problem here ----------------- Post-run Analysis ----------------- We don't provide any experiment managers other than the ability to save runs to specific folders using the --experiment_name handle. However, there are a number of functions available in utilities.save_plots which allow for plotting of run statistics. Post-run Analysis - Single Runs ------------------------------- By default, runs save a plot of best fitness (unless --debug is specified). Additionally, users can save plots from pre-existing stats.tsv files (i.e. stats files generated upon completion of a run) using: utilities.save_plot.save_plot_from_file(file_name, stat_name) where: file_name = "./results/folder_name/stats.tsv" (the full file path to the file) stat_name = a valid key from the stats.stats.stats dictionary. This will generate a plot of the given stat and save it in the same location as the original stats file. Alternatively, users can directly pass data in to: utilities.save_plot.save_plot_from_data(stat_data, stat_name) where: stat_data = some list of data, i.e. [data[stat_name] for data in utilities.trackers.stats_list] stat_name = a valid key from the stats.stats.stats dictionary. Post-run Analysis - Multiple Runs --------------------------------- If multiple runs are saved in a folder using the --experiment_name handle, a file is given for generating plots of the average stat values across these multiple runs, as given by: stats.parse_stats.py There are a number of flags that must be used for passing values via the command-line. To see a full list of these just run the following $ python stats/parse_stats.py --help ---------- References ---------- Michael O'Neill and Conor Ryan, "Grammatical Evolution: Evolutionary Automatic Programming in an Arbitrary Language", Kluwer Academic Publishers, 2003. Michael O'Neill, Erik Hemberg, Conor Gilligan, Elliott Bartley, and James McDermott, "GEVA: Grammatical Evolution in Java", ACM SIGEVOlution, 2008. http://portal.acm.org/citation.cfm?id=1527066. Get GEVA: http://ncra.ucd.ie/Site/GEVA.html
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published