Skip to main content

Genetic Algorithms: Mutation

So far in my series on Genetic Algorithms I've covered: Basic ConceptsEncodingSelection, and Crossover. Those operations will get you most of the way to a functioning optimizer. However, all of the operations we've discussed exert convergence pressure on the population. That is, selection and crossover have a tendency to draw the population together. The last operator I want to talk about - mutation - is different. It wants to split the population apart. And as counter intuitive as it seems, divergence can be even more important than convergence.



Divergence in a genetic algorithm is crucial if you want to end up with quality solutions. While the population itself provides some level of inherent space exploration (each individual represents a point in solution space), selection and crossover bring all of the individuals close together, essentially limited the solution space that is being explored. Mutation allows the algorithm to break out of this rut, exploring more of the solution space and avoiding local minima.

Safe vs. Unsafe Mutation


There are many different ways to mutate individuals in a population, and they fall into two basic categories: safe and unsafe. Safe mutations are those which change an individual while ensuring that the changed individual still falls within the constraints established for the population. Unsafe mutations create individuals which are not viable - they do not meet the constraints that you've established for individuals in the population.

It's important that you ensure safe mutations so that you are not randomly eliminating individuals in your population during the mutation phase. There are many ways to accomplish safe mutation. I'll talk about point mutations (single- and multi-point), and then I'll talk about more exotic forms of mutation which ensure safety for more complex individuals (like neural networks).

Single-point vs. Multi-point Mutation


The two basic forms of mutation are single-point and multi-point mutation. Single-point mutation changes a single gene in the individual. If you are optimizing the weights in a neural network, for example, this would be changing a single weight at random. In a typical image classification network containing 4-12 million trainable parameters, this is a drop in the bucket, and can mean that meaningful change to your individual will be slow to occur.

Multi-point mutation, on the other hand, modifies many genes at once. How many is up to you, but like other operations in a genetic algorithm it's typically determined randomly. However, for complex individuals (again think neural networks) this can still take a long time to meaningfully change an individual.

Exotic Mutation Operations


Many genetic algorithms which operate on complex individuals use more exotic forms of mutation. These are all problem specific, and depend on what you are trying to optimize. A more recent example of an exotic mutation scheme is the safe mutation Uber is using for its neuroevolution code. In Uber's scheme, individuals are mutated by subjecting them to training by backpropagation with stochastic gradient descent (SGD). It ensures that the individual is safely mutated, and has a meaningful impact on the expressed solution that the individual represents. You can check out Uber's work on this and other neuroevolution strategies here.

There are lots of other ways to perform mutation, and this is by no means an exhaustive list. Many genetic algorithms will require custom operations because they use custom individual encodings. It's all a matter of trying things out, seeing what the effects are, and trying again if necessary.

If you enjoy these posts, feel free to follow me on Twitter and LinkedIn, and subscribe using the widget in the side bar to get notifications straight to you email.

Popular posts from this blog

Neural Network Dense Layers

Neural network dense layers (or fully connected layers) are the foundation of nearly all neural networks. If you look closely at almost any topology, somewhere there is a dense layer lurking. This post will cover the history behind dense layers, what they are used for, and how to use them by walking through the "Hello, World!" of neural networks: digit classification.

Arrays of Structures or Structures of Arrays: Performance vs. Readability

It's one of those things that might have an obvious answer if you have ever written scientific software for a vector machine. For everyone else, it's something you probably never even thought about: Should I write my code with arrays of structures (or classes), or structures (or classes) of arrays. Read on to see how both approaches perform, and what kind of readability you can expect from each approach.

Neural Network Pooling Layers

Neural networks need to map inputs to outputs. It seems simple enough, but in most useful cases this means building a network with millions of parameters, which look at millions or billions of relationships hidden in the input data. Do we need all of these relationships? Is all of this information necessary? Probably not. That's where neural network pooling layers can help. In this post, we're going to dive into the deep end and learn how pooling layers can reduce the size of your network while producing highly accurate models.