Teaching AI is a Lot Like Teaching People: Never Start from Scratch

When was the last time you tried to teach a 1 year old to decipher x-rays? My guess is never, and with good reason. It makes no sense.

Instead, you teach toddlers letters, numbers, and shapes. Then you teach them to identify pictures of dogs, cats, airplanes, and other real-world objects. Only after many years of cumulative learning do they become skilled enough to become a radiologist.

The same goes for artificial intelligence. Neural networks have made tremendous strides in recent years, learning to accurately classify all types of images, video, and audio. Not only are neural network-based AI systems now detecting diseases in x-rays, they are managing inventory, talking to us as customer service representatives, and driving cars.

But it wasn't always that way. When neural network research began in the 1950s, they couldn't do anything. In fact, early neural network research essentially collapsed after Marvin Minsky posed the XOR (exclusive or) problem, showing that the perceptrons in the 1960s couldn't even work out that most basic logical concepts.

It took considerable amounts of both time and complexity for things to change. By the 1980s neural networks were more than a single perceptron, they were deeper, wider, and considerably more computationally complex. No longer was the process of setting neuron weights completed by searching; weights were set by training.

Like many other computer algorithms, including neural networks themselves, the process of training these biologically-inspired graphs came directly from nature itself: backpropagation. In the mammalian brain, electrical backpropagation occurs through dendritic voltage-gated calcium channels. In artificial neural networks, backpropagation is a mathematical algorithm for distributing errors between realized and expected outcomes across the various neural connections of the network.

It vastly improved the optimization time of neural networks, and allowed network complexity to explode. By the late 1980s, neural networks were able to read numbers and letters. Twenty years later, neural networks were able to identify pictures of dogs, cats, airplanes, and other real-world objects. Today, just five years later, neural networks have achieved human capability in a wide range of areas, from conversation to logistics, and from driving to medicine.

This is not just a story of evolution. It isn't just about network complexity (in fact, many modern networks are likely more complex than they need to be to achieve the same results). It's about accumulation of knowledge. Like an infant growing into a toddler, going to primary and secondary school, college, university, and graduate school, neural networks can take advantage of the accumulation of knowledge because we don't tend to train neural networks from infancy every time.

We train them from the most recent best capability we've achieved. When we build a network to detect diseases in x-rays, we start with a model that already knows how to identify dogs, cats, airplanes, and other real-world objects. We then give it the domain-specific knowledge to complete the task.

Does it make the networks better at the domain-specific task? Not necessarily. But it does reduce the time to train by giving the network a better starting point. The initial loss is lower than would be when training from scratch, and the loss profile is smoother and less erratic.

Artificial neural networks may be inspired by the human brain, but in many ways their learning is completely reversed. While the human brain is able to easily abstract numbers, letters, dogs, cats, airplanes, and other real-world objects, it took artificial neural networks decades to learn to do these things well. And while humans need decades of study to go from toddlers to radiologists, artificial neural networks were able to make that leap in just five years.

What will neural networks be able to do in the next five years? I am not entirely sure, but I'm excited to find out.

Follow me on Twitter and LinkedIn

The Professional Programmer

Search This Blog

Teaching AI is a Lot Like Teaching People: Never Start from Scratch

Labels

Popular posts from this blog

Neural Network Dense Layers

Arrays of Structures or Structures of Arrays: Performance vs. Readability

Genetic Algorithms: Mutation