Introduction
Genetic programming (Banzhaf et al., 1998) uses an evolutionary algorithm (Eiben, 2003) to search for programs that solve a well-defined problem that can be evaluated using one number, the fitness of the program. Evolutionary algorithms are inspired by the evolution of species by natural selection. The main difference of this search paradigm, compared to methods that are more traditional, is that is works with a group of candidate solution (a population), instead of just one solution. This makes the method more robust, i.e. less likely to get trapped in a local minimum. These algorithms code their solution as a gene. New generations of solutions are formed using sexual (and a-sexual) reproduction. Sexual reproduction (crossover of genes) is emphasized as it allows for the combination of multiple partial solutions into one, thus using the parallel computations made by entire population, making the use of a population less inefficient as it would be otherwise.Bloat and evolution
One of the fundamental problems of genetic programming (GP) is bloat. After some generations (typically below one hundred), the search for better programs halts as the programs become too large. The main cause of bloat is generally thought to be the proliferations of introns. Introns are parts of the program that do not contribute to the calculation that is made, e.g. a large calculation that is multiplied by zero in the end, or a section that start with: if false then. Additionally, bloat can be caused by inefficient code, e.g. two times x=x+1, instead of x=x+2.In genetic programming, the linear genes are typically converted into trees or lists with lines of code. Some other coding methods are also used, see Banzhaf et al. (1998). Performing a simple crossover on the genes would almost certainly result in an illegal program. Therefore, crossover is performed on a higher level, e.g. two sub-trees are exchanged between the sets of genes, or sections of program lines are exchanged. This way the child-program is legal code, but still, it is often bad code, and its results are not very similar to its parents. As a consequence, evolution favors bloat at the moment that it becomes hard to find better solutions. In this case, it is better for the parents to get offspring that is at least just as fit as they are, as getting less fit children. The probability to get equally fit offspring is better in large programs (with a small coding part) than in small ones (most of which is useful code), as in large programs it is more likely that the crossover operator does not destroy anything. The problem of bloat is thus intimately linked to the destructive power of crossover.