Bright Wire

Getting a neural net to learn the rules of binary addition and how to use its memory to store carry bits as appropriate.

Motivation

Sometimes we want the computer to be able to make predictions based on a sequence of inputs. Each single item in the sequence doesn't contain enough information by itself to make a successful prediction. Instead, the nearby context is needed as well.

One solution is to give a classifier not just the current item but the surrounding items as well. But the problem is that we need to fix the size of the window and the relevant context might sometimes be outside that window.

A better solution is to use recurrent neural networks that use a simple form of memory that allow them to learn from arbitrarily long sequences, and which use their memory to change their predictions based on what they've seen previously in the sequence.

Binary Addition

An example of this is binary addition. The rules are simple enough, but the problem is that bits need to "carry" over from previous additions if there is an overflow. The bit might carry across many subsequent additions, so passing a nearby context window isn't always going to work.

Rules

0+0 = 0 0+1 or 1+0 = 1 1+1 = 0 (with a carry of 1) 1+1 (with carry) = 1 (with another carry of 1)

For example, adding:

01011111101010000101010001100100 11010011100101000110001101101000

Equals

10100011111111000000111100111100

Data Format

The sequences that we're going to train on are the bits of each of the two input numbers together with the single output bit at that position of the two numbers added together. For example the training sequence for the above addition is (left to right):

{0,1} => {1} {1,1} => {0} {0,0} => {1} {1,1} => {0} {1,0} => {0} {1,0} => {0} {1,1} => {1} {1,1} => {1} {1,1} => {1} etc.

Generating Data

Bright Wire includes a helper class to generate random sequences of binary integers in the above data format.

We generate 1000 such random integer additions and then split them into training and test sets.

// generate 1000 random integer additions (split into training and test sets)
var data = BinaryIntegers.Addition(1000, false).Split(0);

Network Design

For this example we create a simple recurrent neural network (RNN) with memory size 32 and RELU activation. A second feed forward layer with RELU activation converts the output to size 2.

The graph uses Adam gradient descent optimisation and Gaussian weight initialisation. The error metric is binary classification in which each output is rounded to either 0 or 1.

The initial "memory" for the network is initialised to zero. The network will learn an improved initial memory that is saved into the model when training has completed.

The network is trained using backpropagation through time and with a batch size of 16 and learning rate of 0.01.

using (var lap = BrightWireProvider.CreateLinearAlgebra(false)) {
    var graph = new GraphFactory(lap);

    // binary classification rounds each output to either 0 or 1
    var errorMetric = graph.ErrorMetric.BinaryClassification;

    // configure the network properties
    graph.CurrentPropertySet
        .Use(graph.GradientDescent.Adam)
        .Use(graph.GaussianWeightInitialisation(false, 0.1f, GaussianVarianceCalibration.SquareRoot2N))
    ;

    // create the engine
    var trainingData = graph.CreateDataSource(data.Training);
    var testData = trainingData.CloneWith(data.Test);
    var engine = graph.CreateTrainingEngine(trainingData, learningRate: 0.01f, batchSize: 16);

    // build the network
    const int HIDDEN_LAYER_SIZE = 32, TRAINING_ITERATIONS = 30;
    var memory = new float[HIDDEN_LAYER_SIZE];
    var network = graph.Connect(engine)
        .AddSimpleRecurrent(graph.ReluActivation(), memory)
        .AddFeedForward(engine.DataSource.OutputSize)
        .Add(graph.ReluActivation())
        .AddBackpropagationThroughTime(errorMetric)
    ;

Under the Hood

The SimpleRecurrent layer is composed of several sub nodes that use the supplied memory buffer as follows:

First, the input is fed into a feed forward layer. At the same time, the supplied memory is loaded into a buffer and then fed into another feed forward layer. The output of these two layers are added together and then activated. The output from the activation is then saved back to the buffer.

As the next item in the sequence arrives it is fed into the feed forward layer as above, but instead of loading the initial memory buffer as before, the second feed forward layer gets the output that was saved from the previous time around.

During backpropagation this all happens in reverse, however the supplied memory is only updated when backpropagating (through time) the first item in the sequence.

Training and Evaluation

The best found graph is saved while training for 30 epochs. Simple recurrent neural networks can be a bit fiddly - they sometimes crash and burn while learning and other times they perform brilliantly. If your network fails to learn, try running it again ; )

After training a graph execution engine is created with the highest scoring set of parameters and executed against 8 unseen integer addition pairs. The output is grouped and written to the console.

// train the network for twenty iterations, saving the model on each improvement
Models.ExecutionGraph bestGraph = null;
engine.Train(TRAINING_ITERATIONS, testData, errorMetric, bn => bestGraph = bn.Graph);

// export the graph and verify it against some unseen integers on the best model
var executionEngine = graph.CreateEngine(bestGraph ?? engine.Graph);
var testData2 = graph.CreateDataSource(BinaryIntegers.Addition(8, true));
var results = executionEngine.Execute(testData2);

// group the output
var groupedResults = new (FloatVector[] Input, FloatVector[] Target, FloatVector[] Output)[8];
for(var i = 0; i < 8; i++) {
    var input = new FloatVector[32];
    var target = new FloatVector[32];
    var output = new FloatVector[32];
    for(var j = 0; j < 32; j++) {
        input[j] = results[j].Input[0][i];
        target[j] = results[j].Target[i];
        output[j] = results[j].Output[i];
    }
    groupedResults[i] = (input, target, output);
}

// write the results
foreach (var result in groupedResults) {
    Console.Write("First:     ");
    foreach (var item in result.Input)
        _WriteAsBinary(item.Data[0]);
    Console.WriteLine();

    Console.Write("Second:    ");
    foreach (var item in result.Input)
        _WriteAsBinary(item.Data[1]);
    Console.WriteLine();
    Console.WriteLine("           --------------------------------");

    Console.Write("Expected:  ");
    foreach (var item in result.Target)
        _WriteAsBinary(item.Data[0]);
    Console.WriteLine();

    Console.Write("Predicted: ");
    foreach (var item in result.Output)
        _WriteAsBinary(item.Data[0]);
    Console.WriteLine();
    Console.WriteLine();
}

Output

Complete Source Code

View the complete source on GitHub.

Fork me on GitHub