Bright Wire

Learning to classify sentences as containing either positive or negative sentiment with Naive Bayes and Neural Networks.

Motivation

Sentiment classification is the task of getting a computer to decide if a sentence contains positive or negative words about the things it describes.

This is how aggregate reviews can be formed that capture the majority sentiment about particular aspects of products and services.

This tutorial shows how you can use Bright Wire to train three different models on a simple sentiment classification data set and then to evaluate those models on random text that you type into the console.

Getting Started

You'll need to download a sentiment classification data set and unzip it to a directory on your machine.

Create a new .NET 4.6 console application and include Bright Wire.

Install-Package BrightWire.Net4

If you have an NVIDIA GPU then you can also install the CUDA version to speed up training.

Install-Package BrightWire.CUDA.Net4.x64

Loading the Data

There are three data files in this data set. Each is a list of tab separated sentences and classification labels. For this tutorial we are combining them into one large data set and then splitting that combined data set into training and test data sets.

var files = new[] {
    "amazon_cells_labelled.txt",
    "imdb_labelled.txt",
    "yelp_labelled.txt"
};
var LINE_SEPARATOR = "\n".ToCharArray();
var SEPARATOR = "\t".ToCharArray();
var stringTable = new StringTableBuilder();
var sentimentData = files.SelectMany(f => File.ReadAllText(dataFilesPath + f)
    .Split(LINE_SEPARATOR)
    .Where(l => !String.IsNullOrWhiteSpace(l))
    .Select(l => l.Split(SEPARATOR))
    .Select(s => Tuple.Create(_Tokenise(s[0]), s[1][0] == '1' ? "positive" : "negative"))
    .Where(d => d.Item1.Any())
).Shuffle(0).ToList();
var splitSentimentData = sentimentData.Split();

Bright Wire includes a simple tokeniser that is being called for each line in the combined data set.

This function splits the line into lowercase tokens, and then joins negations within each list of tokens. Joining negations is a simple technique that means that the sentence "this is not good" becomes "this is not_good". This is to differentiate "good" and "not_good" tokens as we build the models.

static string[] _Tokenise(string str)
{
    return SimpleTokeniser.JoinNegations(SimpleTokeniser.Tokenise(str).Select(s => s.ToLower())).ToArray();
}

Naive Bayes Baseline

Bright Wire includes two separate Naive Bayes classifiers that are designed expressly to classify text: Multinomial Naive Bayes and Bernoulli Naive Bayes. They are both simple, fast and give good results on a range of text classification tasks.

These classifiers both work on a tuple of a string and an IndexList - which is a list of unsigned integers representing the string indexes. IndexLists can repeat their indices.

// documents are encoded as a tuple of string and IndexList
(string Classification, IndexList Data)

Bright Wire includes a StringTableBuilder that converts words to string indexes (each unique word gets a unique string index starting from zero). This is done in the following function:

static IReadOnlyList<(string Classification, IndexList Data)> _BuildIndexedClassifications(
	IReadOnlyList<Tuple<string[], string>> data,
  	StringTableBuilder stringTable
)
{
    return data
        .Select(d => (d.Item2, new IndexList { Index = d.Item1
        	.Select(str => stringTable.GetIndex(str)).ToArray() 
        }))
        .ToList()
    ;
}

Once we have the training and test sets we can train and evaluate the two Naive Bayes classifiers.

// build training and test classification bag
var trainingClassificationBag = _BuildIndexedClassifications(splitSentimentData.Training, stringTable);
var testClassificationBag = _BuildIndexedClassifications(splitSentimentData.Test, stringTable);

// train a bernoulli naive bayes classifier
var bernoulli = trainingClassificationBag.TrainBernoulliNaiveBayes();
Console.WriteLine("Bernoulli accuracy: {0:P}", testClassificationBag
    .Classify(bernoulli.CreateClassifier())
    .Average(r => r.Score)
);

// train a multinomial naive bayes classifier
var multinomial = trainingClassificationBag.TrainMultinomialNaiveBayes();
Console.WriteLine("Multinomial accuracy: {0:P}", testClassificationBag
    .Classify(multinomial.CreateClassifier())
    .Average(r => r.Score)
);

The Bernoulli and Multinomial Naive Bayes classifiers get a score of around 83% and 84% respectively on the test data set.

Stepping up to a Neural Network

It turns out that we can do a little better than Naive Bayes with a simple feed forward neural network.

First we need to convert the tuples of (string and IndexList) to a DataTable.

Each sentence will become a sparse vector of size max(string-index). The value of each entry will be the count of that string index in the document. Since there are over 5000 unique strings and most documents have far less than 5000 unique words, most of the vector entries will be set to zero.

Bright Wire will create a DataSource that efficiently converts sparse vectors to these dense vectors on demand.

// convert the index lists to vectors and normalise along the way
var sentimentDataBag = _BuildIndexedClassifications(sentimentData, stringTable);
var sentimentDataTable = sentimentDataBag.ConvertToTable();
var normalisedDataTable = sentimentDataTable.Normalise(NormalisationType.Standard);
var vectoriser = normalisedDataTable.GetVectoriser();
var sentimentDataSet = normalisedDataTable.Split(0);
var dataTableAnalysis = normalisedDataTable.GetAnalysis();

using (var lap = BrightWireGpuProvider.CreateLinearAlgebra()) {
    var graph = new GraphFactory(lap);
    var trainingData = graph.CreateDataSource(sentimentDataSet.Training, vectoriser);
    var testData = graph.CreateDataSource(sentimentDataSet.Test, vectoriser);
    var indexListEncoder = trainingData as IIndexListEncoder;

Training the Network

RELU and Adam gradient descent optimisation seem to work well on this data. A drop out layer is inserted between the first and second feed forward layers.

When training the highest scoring set of network parameters is saved and then the engine is refreshed with those parameters after it has processed all epochs.

// use a one hot encoding error metric, rmsprop gradient descent and xavier weight initialisation
var errorMetric = graph.ErrorMetric.OneHotEncoding;
var propertySet = graph.CurrentPropertySet
    .Use(graph.GradientDescent.RmsProp)
    .Use(graph.WeightInitialisation.Xavier)
;

var engine = graph.CreateTrainingEngine(trainingData, 0.3f, 128);
engine.LearningContext.ScheduleLearningRate(5, 0.1f);
engine.LearningContext.ScheduleLearningRate(11, 1f);
engine.LearningContext.ScheduleLearningRate(15, 0.3f);

// train a neural network classifier
var neuralNetworkWire = graph.Connect(engine)
    .AddFeedForward(512, "layer1")
    .Add(graph.ReluActivation())
    .AddDropOut(0.5f)
    .AddFeedForward(trainingData.OutputSize, "layer2")
    .Add(graph.ReluActivation())
    .AddBackpropagation(errorMetric, "first-network")
;

// train the network
Console.WriteLine("Training neural network classifier...");
const int TRAINING_ITERATIONS = 10;
GraphModel bestNetwork = null;
engine.Train(TRAINING_ITERATIONS, testData, errorMetric, network => bestNetwork = network);
if (bestNetwork != null)
    engine.LoadParametersFrom(bestNetwork.Graph);
var firstClassifier = graph.CreateEngine(engine.Graph);

The Neural Network ends up with a slightly higher accuracy compared to the Naive Bayes classifiers.

Stacking the Classifiers

We can get a better accuracy by stacking our three classifiers and then by training a fourth classifier on their output.

In this case we want to show that the fourth classifier is smarter than the sum of its parts so we first disable any updates to the neural network layers that have already been trained. Otherwise, these layers would continue to learn from the backpropagation signals from the fourth classifier and we wouldn't know where the gain is.

Next we add the Bernoulli and Multinomial Naive Bayes classifiers to the graph along two separate wires. These two wires are joined with the first neural network's output and the joined output is extended with additional neural network layers to learn an output that considers the output of all three classifiers.

// stop the backpropagation to the first neural network
engine.LearningContext.EnableNodeUpdates(neuralNetworkWire.Find("layer1"), false);
engine.LearningContext.EnableNodeUpdates(neuralNetworkWire.Find("layer2"), false);

// create the bernoulli classifier wire
var bernoulliClassifier = bernoulli.CreateClassifier();
var bernoulliWire = graph.Connect(engine)
    .AddClassifier(bernoulliClassifier, sentimentDataSet.Training, dataTableAnalysis)
;

// create the multinomial classifier wire
var multinomialClassifier = multinomial.CreateClassifier();
var multinomialWire = graph.Connect(engine)
    .AddClassifier(multinomialClassifier, sentimentDataSet.Training, dataTableAnalysis)
;

// join the bernoulli, multinomial and neural network classification outputs
var firstNetwork = neuralNetworkWire.Find("first-network");
var joined = graph.Join(multinomialWire, graph.Join(bernoulliWire, graph.Connect(trainingData.OutputSize, firstNetwork)));

// train an additional classifier on the output of the previous three classifiers
joined
    .AddFeedForward(outputSize: 64)
    .Add(graph.ReluActivation())
    .AddDropOut(dropOutPercentage: 0.5f)
    .AddFeedForward(trainingData.OutputSize)
    .Add(graph.ReluActivation())
    .AddBackpropagation(errorMetric)
;

// train the network again
Console.WriteLine("Training stacked neural network classifier...");
GraphModel bestStackedNetwork = null;
engine.Train(10, testData, errorMetric, network => bestStackedNetwork = network);
if (bestStackedNetwork != null)
    engine.LoadParametersFrom(bestStackedNetwork.Graph);

The fourth classifier ends up with an accuracy of 86.4% - higher than any of it's input classifiers, showing that it has learnt about the input classifiers and how to combine them to get better results.

Evaluating the Classifiers

We've trained four different classifiers, now we can set them loose on the wild!

Enter some text and see what they think.

Console.WriteLine("Enter some text to test the classifiers...");
while (true) {
    Console.Write(">");
    var line = Console.ReadLine();
    if (String.IsNullOrWhiteSpace(line))
        break;

    var tokens = _Tokenise(line);
    var indexList = new List<uint>();
    foreach (var token in tokens) {
        if (stringTable.TryGetIndex(token, out uint stringIndex))
            indexList.Add(stringIndex);
    }
    if (indexList.Any()) {
        var queryTokens = indexList.GroupBy(d => d).Select(g => Tuple.Create(g.Key, (float)g.Count())).ToList();
        var vector = new float[trainingData.InputSize];
        foreach (var token in queryTokens)
            vector[token.Item1] = token.Item2;
        var indexList2 = IndexList.Create(indexList.ToArray());
        var encodedInput = indexListEncoder.Encode(indexList2);

        Console.WriteLine("Bernoulli classification: " + bernoulliClassifier.Classify(indexList2).First().Label);
        Console.WriteLine("Multinomial classification: " + multinomialClassifier.Classify(indexList2).First().Label);
        var result = firstClassifier.Execute(encodedInput);
        var classification = vectoriser.GetOutputLabel(1, (result.Output[0].Data[0] > result.Output[0].Data[1]) ? 0 : 1);
        Console.WriteLine("Neural network classification: " + classification);

        var stackedResult = engine.Execute(encodedInput);
        var stackedClassification = vectoriser.GetOutputLabel(1, (stackedResult.Output[0].Data[0] > stackedResult.Output[0].Data[1]) ? 0 : 1);
        Console.WriteLine("Stack classification: " + stackedClassification);
    } else
        Console.WriteLine("Sorry, none of those words have been seen before.");
    Console.WriteLine();
}

Results

Complete Source Code

View the complete source on GitHub.

Fork me on GitHub