Skip to main content

Teaching AI is a Lot Like Teaching People: Never Start from Scratch

When was the last time you tried to teach a 1 year old to decipher x-rays? My guess is never, and with good reason. It makes no sense.



Instead, you teach toddlers letters, numbers, and shapes. Then you teach them to identify pictures of dogs, cats, airplanes, and other real-world objects. Only after many years of cumulative learning do they become skilled enough to become a radiologist.

The same goes for artificial intelligence. Neural networks have made tremendous strides in recent years, learning to accurately classify all types of images, video, and audio. Not only are neural network-based AI systems now detecting diseases in x-rays, they are managing inventory, talking to us as customer service representatives, and driving cars.

But it wasn't always that way. When neural network research began in the 1950s, they couldn't do anything. In fact, early neural network research essentially collapsed after Marvin Minsky posed the XOR (exclusive or) problem, showing that the perceptrons in the 1960s couldn't even work out that most basic logical concepts.

It took considerable amounts of both time and complexity for things to change. By the 1980s neural networks were more than a single perceptron, they were deeper, wider, and considerably more computationally complex. No longer was the process of setting neuron weights completed by searching; weights were set by training.

Like many other computer algorithms, including neural networks themselves, the process of training these biologically-inspired graphs came directly from nature itself: backpropagation. In the mammalian brain, electrical backpropagation occurs through dendritic voltage-gated calcium channels. In artificial neural networks, backpropagation is a mathematical algorithm for distributing errors between realized and expected outcomes across the various neural connections of the network.

It vastly improved the optimization time of neural networks, and allowed network complexity to explode. By the late 1980s, neural networks were able to read numbers and letters. Twenty years later, neural networks were able to identify pictures of dogs, cats, airplanes, and other real-world objects. Today, just five years later, neural networks have achieved human capability in a wide range of areas, from conversation to logistics, and from driving to medicine.

This is not just a story of evolution. It isn't just about network complexity (in fact, many modern networks are likely more complex than they need to be to achieve the same results). It's about accumulation of knowledge. Like an infant growing into a toddler, going to primary and secondary school, college, university, and graduate school, neural networks can take advantage of the accumulation of knowledge because we don't tend to train neural networks from infancy every time.

We train them from the most recent best capability we've achieved. When we build a network to detect diseases in x-rays, we start with a model that already knows how to identify dogs, cats, airplanes, and other real-world objects. We then give it the domain-specific knowledge to complete the task.

Does it make the networks better at the domain-specific task? Not necessarily. But it does reduce the time to train by giving the network a better starting point. The initial loss is lower than would be when training from scratch, and the loss profile is smoother and less erratic.

Artificial neural networks may be inspired by the human brain, but in many ways their learning is completely reversed. While the human brain is able to easily abstract numbers, letters, dogs, cats, airplanes, and other real-world objects, it took artificial neural networks decades to learn to do these things well. And while humans need decades of study to go from toddlers to radiologists, artificial neural networks were able to make that leap in just five years.

What will neural networks be able to do in the next five years? I am not entirely sure, but I'm excited to find out.



Follow me on Twitter and LinkedIn

Popular posts from this blog

Neural Network Dense Layers

Neural network dense layers (or fully connected layers) are the foundation of nearly all neural networks. If you look closely at almost any topology, somewhere there is a dense layer lurking. This post will cover the history behind dense layers, what they are used for, and how to use them by walking through the "Hello, World!" of neural networks: digit classification.

The Only Neural Network Layer You Will EVER Need

Neural networks can seem daunting, complicated, and impossible to explain. But in reality they are remarkably simple. In fact, they only ever require a single layer of neurons. In my previous post about the basics of neural networks , I talked about how neurons compute values. They take a set of inputs, multiply each input value by a weight, and sum the terms. An activation function is then applied to the sum of products, to yield the output value. That output value could be zero (i.e., did not activate), negative, or positive. In this post, we will go deeper down the rabbit hole. We will look at neuron layers, which layers are actually necessary for a network to function, and come to the stunning realization that all neural networks have only a single output. Organizing Neurons into Layers In most neural networks, we tend to organize neurons into layers. The reason for this comes from graph theory (as neural networks are little more than computational graphs). Each layer con

Neural Network Pooling Layers

Neural networks need to map inputs to outputs. It seems simple enough, but in most useful cases this means building a network with millions of parameters, which look at millions or billions of relationships hidden in the input data. Do we need all of these relationships? Is all of this information necessary? Probably not. That's where neural network pooling layers can help. In this post, we're going to dive into the deep end and learn how pooling layers can reduce the size of your network while producing highly accurate models.