Skip to main content

Neural Network Dense Layers

Neural network dense layers (or fully connected layers) are the foundation of nearly all neural networks. If you look closely at almost any topology, somewhere there is a dense layer lurking.

This post will cover the history behind dense layers, what they are used for, and how to use them by walking through the "Hello, World!" of neural networks: digit classification.


The Problem with the Perceptron


Check out my previous post, What are Neural Networks?

Neural networks come in many different variations these days, from convolutional and recurrent, to homogenous and heterogeneous, to linear and branching.

But the original neural networks were a single neuron: the perceptron. Perceptrons showed some promise, but came up short when attempting to handle some of the simplest logical operations. Unfortunately, perceptrons didn't have enough complexity to approximate many of the functions that neural networks can approximate today.

The solution was to add more neurons. The question was how. Should neurons be placed in series, building a deeper network of single neurons; in parallel, creating a wider network; or both?

The next iteration of neural networks was both. By adding width, the network could simultaneously approximate more functions, expanding the solution space. By adding depth, the network could use those parallel approximations to make more informed decisions. The multi-layer perceptron, or MLP, was born.

Multi-layer Perceptrons

An example of a Multi-layer Perceptron

The MLP used a layer of neurons that each took input from every input component. Each was a perceptron. And each perceptron in this layer fed its result into another perceptron. In Keras, and many other frameworks, this layer type is referred to as the dense (or fully connected) layer.

Neural network dense layers map each neuron in one layer to every neuron in the next layer. This allows for the largest potential function approximation within a given layer width. It also means that there are a lot of parameters to tune, so training very wide and very deep dense networks is computationally expensive.

But for limited function approximations in a limited input space, it was an ideal system. One of the first uses of this type of network was digit identification for ZIP codes. Yann LeCun (then at Bell Labs) developed an MLP-based model to identify digits from low-resolution black and white images (like those that would have been available from digital cameras of the late 1980s).

A Digit Classifier with Neural Network Dense Layers


We'll be using Keras to build a digit classifier based on neural network dense layers. Keras is a high-level abstraction for designing neural networks in a layer-wise fashion.

Keras also has a set of convenient dataset loader functions to download common datasets. For this example, we'll be using the MNIST dataset, which consists of examples of handwritten digits as 28x28 grayscale images.

For this post, I'll actually be stepping through Keras' mnist_mlp.py example code on GitHub. It's a great resource if you are just learning to use Keras for constructing neural networks, so I recommend checking it out.

Providing Inputs to the Network


Each digit image is a 2-dimensional image of 28x28 pixels, or 784 pixels. Each pixel will be an input to the network, provided as an unrolled 1-dimensional array (or tensor).

Keras provides an MNIST dataset loader which downloads the dataset of handwritten digits and separates it into training and testing sets, each with images and ground-truth labels.

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Preparing the Dataset for training


Although the dataset is easily loaded, the images have to be reshaped into 1-dimensional pixel arrays. In addition, the grayscale images have pixels represented as integer intensities from 0 (black) to 255 (white). However, during training it's easier to have values between 0 and 1. This is because large integers can lead to problems when training, where neurons are over-saturated and error information cannot be properly backpropagated (also known as exploding gradients). To fix this, we simply divide each pixel value by 255 to normalize to real values between 0 and 1.

We also need to set the training and testing labels (the y parts) to categorical. This means that each image can belong to one (and only one) category. In this case, there are 10 categories, one for each digit (0 through 9).

Read my post on Output Layers

batch_size = 128
num_classes = 10
epochs = 20

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

Constructing the Network


The construction of the actual neural network requires remarkably few lines of code. This particular network topology consists of only a few layers.

  1. A Dense layer of 512 neurons which accepts 784 inputs (the input image)

  2. A Dropout layer, which is used to help prevent over fitting to the training data

  3. A second Dense layer of 512 neurons

  4. A second Dropout layer

  5. A third Dense layer of 10 neurons, which will provide the final classification


model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))
model.summary()

Training the Network


It's finally time to train the network to identify handwritten digits. The first task is to compile the neural network description, during which point we provide a loss function (in this case, the built-in categorical_crossentropy), and an optimizer and optimizer metric.

Next, the model is fit to the training data, and validated against the test data. Once the fit is complete (in this case we train for 20 epochs, or 20 passes through the training data), we evaluate the model to determine its accuracy on the testing data.

model.compile(loss='categorical_crossentropy',
optimizer=RMSprop(),
metrics=['accuracy'])

history = model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Dense Layers are Just the Beginning


In the evolution of neural networks, dense layers were the first to appear after the original perceptron. However, a literal zoo of new neurons, layers, and topologies have been developed in the decades since the MLP was first shown to be a universal function approximatorIn future posts I'll cover how these other advancements have improved the capabilities and reduced the training time of deeper, more complex neural networks which can translate languages, identify diseases, make complex predictions, and drive automobiles.

Popular posts from this blog

Arrays of Structures or Structures of Arrays: Performance vs. Readability

It's one of those things that might have an obvious answer if you have ever written scientific software for a vector machine. For everyone else, it's something you probably never even thought about: Should I write my code with arrays of structures (or classes), or structures (or classes) of arrays. Read on to see how both approaches perform, and what kind of readability you can expect from each approach.

Neural Network Pooling Layers

Neural networks need to map inputs to outputs. It seems simple enough, but in most useful cases this means building a network with millions of parameters, which look at millions or billions of relationships hidden in the input data. Do we need all of these relationships? Is all of this information necessary? Probably not. That's where neural network pooling layers can help. In this post, we're going to dive into the deep end and learn how pooling layers can reduce the size of your network while producing highly accurate models.