; XEmacs: This file contains -*-Scheme-*- source code. ;;; John David Stone ;;; Department of Mathematics and Computer Science ;;; Grinnell College ;;; stone@cs.grinnell.edu ;;; created December 5, 1999 ;;; last revised December 6, 2000 ;;; The genetic algorithm ;;; When no efficient algorithm for solving an optimization problem is ;;; available, the so-called genetic algorithm can often be used to find a ;;; near-optimal solution. (However, it is not guaranteed to work; the ;;; justification for using it is entirely pragmatic.) ;;; The metaphor underlying the genetic algorithm is biological adaptation ;;; through selective breeding. Think of the problem to be solved as a ;;; constraint imposed by a natural environment and of potential solutions ;;; to that problem as organisms. As the genetic constitution of an ;;; organism partially determines whether it thrives or languishes in a ;;; particular environment, so the internal structure and contents of each ;;; potential solution determine how good or bad it is. The genetic ;;; algorithm simulates the natural selection of organisms that are well ;;; adapted to their environment. ;;; Initially, a ``population'' of possible solutions is constructed, ;;; either at random (as in this implementation) or by some method that ;;; tends to generate good but not necessarily optimal solutions. This ;;; initial population is used to breed a second generation of slightly ;;; better solutions, which is used to breed a third, and so on, until some ;;; specified number of generations have been simulated. At the end of the ;;; process, the best solution in the final generation is returned. ;;; The process by which each new generation replaces its predecessor ;;; comprises three main steps. First, a large number of new potential ;;; solutions are ``bred'' by combining elements of randomly selected ;;; individuals of the old generation. Secondly, ``mutations'' are applied ;;; to these new solutions, replacing some of their elements with randomly ;;; selected values. Finally, the brood of new solutions is culled: Most ;;; of the individuals are discarded as ``less fit,'' that is, farther from ;;; the desired optimal solution. The rest make up the new population. ;;; The algorithm takes seven arguments: ;;; RANDOM-ORGANISM is a procedure of zero arguments that, when invoked, ;;; constructs and returns a randomly constructed possible solution to ;;; whatever problem is to be solved. ;;; POPULATION-SIZE is the number of organisms in the breeding pool in each ;;; generation. Its value is a trade-off between the need for genetic ;;; variation in the population and the need for fast running time in the ;;; program. ;;; SPLICE is a procedure that takes two organisms as arguments and returns ;;; a new organism comprising some genetic material from each parent, ;;; randomly selected. ;;; MUTATE is a procedure that takes an organism and returns a similar ;;; organism, except that a random change may have been made in some ;;; element of its genetic material. ;;; BROOD-SIZE is the number of organisms generated from the breeding pool, ;;; before culling. It must be grezter than or equal to POPULATION-SIZE ;;; and is ordinarily several times greater. ;;; FITNESS is a procedure that takes an organism as argument and returns ;;; an exact real number indicating its degree of fitness (i.e., its ;;; success as a solution to the problem). ;;; NUMBER-OF-GENERATIONS is the number of cycles of natural selection that ;;; the algorithm will simulate before delivering an answer. Again, its ;;; value is a trade-off between processing time and the likelihood of ;;; obtaining a near-optimal solution. (define (genetic-algorithm random-organism population-size splice mutate brood-size fitness number-of-generations) (let ((initial-population (generate-vector (lambda (index) (random-organism)) population-size)) (simulate-generation (pipe (breed (pipe splice mutate) brood-size population-size) (cull fitness population-size)))) (vector-ref (fold-natural number-of-generations simulate-generation initial-population) (predecessor population-size)))) ;;; The algorithm for creating a brood is straightforward: Repeatedly ;;; choose parents and apply the ``maker'' -- the splice-and-mutate ;;; algorithm by which a new organism is produced. Collect the results in ;;; a bag. (define (breed maker brood-size population-size) (lambda (population) (fold-natural brood-size (lambda (brood) (put-into-bag (maker (vector-ref population (random population-size)) (vector-ref population (random population-size))) brood)) (bag)))) ;;; To cull the brood, evaluate the fitness of each new organism and insert ;;; it into a heap, ordered by fitness. Then pull out of the heap just ;;; enough of the organisms to produce the new population. (define (cull fitness population-size) (let ((fitter? (tweak-each cdr >=))) (let ((putter (initial-section put-into-heap fitter?)) (taker (initial-section take-extreme-out-of-heap fitter?))) (lambda (brood) (let ((fitness-heap (fold-bag brood (lambda (individual pile) (putter (cons individual (fitness individual)) pile)) (null-heap))) (preserve-fittest (lambda (new-population rest-of-heap) (receive (chosen others) (taker rest-of-heap) (values (prepend (car chosen) new-population) others))))) (receive (survivors discards) (fold-natural population-size preserve-fittest (list) fitness-heap) (list->vector survivors))))))) ;;; generate-list: Construct a list of a given size by applying a given ;;; procedure GENERATOR successively to each natural number up to, but ;;; not including, the size. (define (generate-list generator size) (unfold-list (next-section = size) generator successor 0)) ;;; generate-vector: Construct a vector of a given size by applying a given ;;; procedure GENERATOR successively to each natural number up to, but not ;;; including, the size. (define generate-vector (pipe generate-list list->vector)) ;;; As a small example of the genetic algorithm, let's train our organisms ;;; to leave the nest, find food, and bring it back to the nest. We'll ;;; model this pretty abstractly: The nest is at the origin of a ;;; two-dimensional lattice of points with integer coordinates. We'll ;;; place food at points (5, 2), (7, 5), and (1, 9). Each organism will ;;; leave the nest and follow the path determined by its genes, moving at ;;; each step one unit either north, east, south, or west. If it reaches ;;; one of the food points, it collects the food and carries it along. ;;; The organism takes 64 steps and then halts. If it is back in the nest ;;; when it stops, it is awarded 20 points; otherwise, it is penalized its ;;; Manhattan distance from the nest (that is, the penalty is the sum of ;;; the absolute values of the coordinates of the point it is on). It is ;;; awarded a bonus of 30 points for each piece of food that it is ;;; carrying. ;;; The genes for an organism will be represented as a vector of 64 ;;; direction symbols ('N, 'E, 'S, or 'W). (define gene-count 64) (define random-direction-symbol (let ((directions (vector 'N 'E 'S 'W))) (lambda () (vector-ref directions (random 4))))) ;;; Our splicing procedure will choose a subvector of the father (selecting ;;; the end-points randomly) and replace the corresponding subvector of the ;;; mother with it. (define (splicer mother father) (let ((one-end (random (successor gene-count))) (other-end (random (successor gene-count)))) (let ((start (min one-end other-end)) (finish (max one-end other-end))) (generate-vector (lambda (index) (vector-ref (if (or (< index start) (<= finish index)) mother father) index)) gene-count)))) ;;; Our mutation procedure passes over each element of the vector ;;; occasionally (with probability 1/64) replacing it with a randomly ;;; selected direction symbol. (define (mutator organism) (generate-vector (lambda (index) (if (zero? (random 64)) (random-direction-symbol) (vector-ref organism index))) gene-count)) ;;; The fitness evaluator simulates the organism's trip and awards bonuses ;;; and penalties. (define (run organism) (lower-ply-natural gene-count (lambda (step x y food-sack) (let ((dir (vector-ref organism step))) (let ((new-x (if (eq? dir 'E) (successor x) (if (eq? dir 'W) (predecessor x) x))) (new-y (if (eq? dir 'N) (successor y) (if (eq? dir 'S) (predecessor y) y)))) (if (point=? new-x new-y 5 2) (values new-x new-y (safe-adjoin-to-set 'food-1 food-sack)) (if (point=? new-x new-y 7 5) (values new-x new-y (safe-adjoin-to-set 'food-2 food-sack)) (if (point=? new-x new-y 1 9) (values new-x new-y (safe-adjoin-to-set 'food-3 food-sack)) (values new-x new-y food-sack))))))) 0 0 (set))) (define (point=? x0 y0 x1 y1) (and (= x0 x1) (= y0 y1))) (define (fitness organism) (receive (last-x last-y food-sack) (run organism) (+ (* (cardinality food-sack) 30) (if (point=? last-x last-y 0 0) 20 (- 0 (abs last-x) (abs last-y)))))) ;;; Filling in the rest of the parameters with plausible values (population ;;; size 200, brood size 1000, number of generations 20), let's see what ;;; we can come up with: (begin (display (genetic-algorithm (lambda () (generate-vector (lambda (index) (random-direction-symbol)) gene-count)) 200 splicer mutator 1000 fitness 20)) (newline)) ;;; In six sample runs, organisms that achieved a perfect score of 110 ;;; (picking up all three pieces of food and returning to the nest) were ;;; produced four times. In the other two runs, the optimal scores in the ;;; twentieth generation were 88 and 86.