Everyone is talking about neural networks these days. I'm glad for it. When I started working in nature-inspired algorithms nearly 20 years ago, it was an extremely niche area. In fact, few saw a future for any of these approaches outside academia. But the deep learning revolution that started in 2006 (or 2012, or 1989, depending on who you ask -- I'll have a full post on the history of neural networks later) is going strong, and shows no signs of letting up. It turns out that people like being able to interact with computers using their voice, having images automatically tagged, finding objects in pictures and movies, having TV shows and brands of toilet paper recommended to them based on their viewing and purchasing history, and self-driving cars. Who knew?
I plan to write entire posts on all of the different nature-inspired techniques (neural networks, artificial immune systems, ant colony optimization, flocking/swarm algorithms, and simulated annealing) in the future. This time, however, I'm just going to cover the basics of the genetic algorithm (GA).
Genetic algorithms use natural selection, DNA, and phenotype expression as inspiration for solving complex problems which are often expressible and combinatorial optimization problems. These problems are NP-hard, meaning that there is no algorithm known which can guarantee a solution in a reasonable amount of time. The classic example of this type of problem is the traveling salesperson problem (TSP). In this problem, a salesperson needs to visit each of n cities, and you must determine the order in which to visit each city (and return home) which produces the shortest total path.
Genetic algorithms work by encoding potential solutions (called individuals, or chromosomes) in a simple encoding format. Binary is one common format, but any linear representation works well, such as a record, structure, or array. Each encoding maps to a specific representation of a solution to the given problem (hence the phenotype expression inspiration). Check out my post on Encoding Individuals.
For each individual, you have to determine that individual's "fitness." Since GAs are inspired by natural selection, this is where we determine the fittest individuals. This is completely problem specific, so I won't get into details here.
Once the fitness of individuals has been determined, the GA must select which individuals should mate, or possibly which individuals should survive in the case of culling algorithms. You almost never want to sort individuals by fitness value and take the top X%. This leads to a major problem: premature convergence, which means convergence to a local minima (the analogous term in machine learning would be "overfitting"). I'll talk about different methods and avoiding premature convergence in other posts. Check out my post on Selection.
Crossover is the mating operation, and it's purpose is to mix the interesting or positive traits of different individuals into new individuals. Think of artificial selection - like horse or dog breeding. Pick 2 (or more) individuals which were selected for mating at random and mix them together to make new individuals. Check out my post on different crossover techniques.
Mutation is used to inject a little randomness into the individuals. After being produced through crossover, new individuals are changed in random (or semi-random) places at a random frequency called the mutation rate. This helps to ensure that the algorithms doesn't converge prematurely.
A GA runs through these operations over and over, with each iteration called a generation. GAs typically run until convergence (the point at which variation in the population is below a specified limit), or for a specific number of generations. The the second case, it makes it easier to guarantee total runtime if the GA runs for a set number of generations, which can be good for time-sensitive operations. You may not get the best answer, but you'll likely get something that is good enough.
That is all there is to genetic algorithms. A few simple operations, a creative encoding strategy, and a little bit of computational time. That's it! I'll have more posts soon going into details about each of these operations and how to tailor GAs to be as efficient and effective as possible. Until then, please share this post and provide feedback in the comments!
I plan to write entire posts on all of the different nature-inspired techniques (neural networks, artificial immune systems, ant colony optimization, flocking/swarm algorithms, and simulated annealing) in the future. This time, however, I'm just going to cover the basics of the genetic algorithm (GA).
How a Genetic Algorithm Works
Genetic algorithms use natural selection, DNA, and phenotype expression as inspiration for solving complex problems which are often expressible and combinatorial optimization problems. These problems are NP-hard, meaning that there is no algorithm known which can guarantee a solution in a reasonable amount of time. The classic example of this type of problem is the traveling salesperson problem (TSP). In this problem, a salesperson needs to visit each of n cities, and you must determine the order in which to visit each city (and return home) which produces the shortest total path.
Encoding Individuals
Genetic algorithms work by encoding potential solutions (called individuals, or chromosomes) in a simple encoding format. Binary is one common format, but any linear representation works well, such as a record, structure, or array. Each encoding maps to a specific representation of a solution to the given problem (hence the phenotype expression inspiration). Check out my post on Encoding Individuals.
Determining Fitness
For each individual, you have to determine that individual's "fitness." Since GAs are inspired by natural selection, this is where we determine the fittest individuals. This is completely problem specific, so I won't get into details here.
Selection
Once the fitness of individuals has been determined, the GA must select which individuals should mate, or possibly which individuals should survive in the case of culling algorithms. You almost never want to sort individuals by fitness value and take the top X%. This leads to a major problem: premature convergence, which means convergence to a local minima (the analogous term in machine learning would be "overfitting"). I'll talk about different methods and avoiding premature convergence in other posts. Check out my post on Selection.
Crossover
Crossover is the mating operation, and it's purpose is to mix the interesting or positive traits of different individuals into new individuals. Think of artificial selection - like horse or dog breeding. Pick 2 (or more) individuals which were selected for mating at random and mix them together to make new individuals. Check out my post on different crossover techniques.
Mutation
Mutation is used to inject a little randomness into the individuals. After being produced through crossover, new individuals are changed in random (or semi-random) places at a random frequency called the mutation rate. This helps to ensure that the algorithms doesn't converge prematurely.
Generations
A GA runs through these operations over and over, with each iteration called a generation. GAs typically run until convergence (the point at which variation in the population is below a specified limit), or for a specific number of generations. The the second case, it makes it easier to guarantee total runtime if the GA runs for a set number of generations, which can be good for time-sensitive operations. You may not get the best answer, but you'll likely get something that is good enough.
Conclusion
That is all there is to genetic algorithms. A few simple operations, a creative encoding strategy, and a little bit of computational time. That's it! I'll have more posts soon going into details about each of these operations and how to tailor GAs to be as efficient and effective as possible. Until then, please share this post and provide feedback in the comments!