Bright Wire

Learning to classify sentences as containing either positive or negative sentiment with Naive Bayes and Neural Networks.

Motivation

Sentiment classification is the task of getting a computer to decide if a sentence contains positive or negative sentiment from building a model of how words are used and how they relate to sentiment.

This tutorial shows how you can use Bright Wire to train three different models on a simple sentiment classification data set and then to evaluate those models on random text that you type into the console.

Loading the Data

There are three data files in this data set. Each is a list of tab separated sentences and classification labels. For this tutorial we are combining them into one large data set and then splitting that combined data set into training and test data sets.

var files = new[]
{
    "amazon_cells_labelled.txt",
    "imdb_labelled.txt",
    "yelp_labelled.txt"
};
var lineSeparator = "\n".ToCharArray();
var separator = "\t".ToCharArray();
_stringTable = new StringTableBuilder();

var sentences = new List<(string[] Sentence, string Classification)>();
foreach (var path in files.Select(f => Path.Combine(directory.FullName, "sentiment labelled sentences", f)))
{
    var data = File.ReadAllText(path)
        .Split(lineSeparator)
        .Where(l => !String.IsNullOrWhiteSpace(l))
        .Select(l => l.Split(separator))
        .Select(s => (Sentence: Tokenise(s[0]), Classification: s[1][0] == '1' ? "positive" : "negative"))
        .Where(d => d.Sentence.Any());
    sentences.AddRange(data);
}

Bright Wire includes a simple tokeniser that is being called for each line in the combined data set.

This function splits the line into lowercase tokens, and then joins negations within each list of tokens. Joining negations is a simple technique that means that the sentence "this is not good" becomes "this is not_good". This is to differentiate "good" and "not_good" tokens as we build the models.

static string[] Tokenise(string str) => SimpleTokeniser.JoinNegations(SimpleTokeniser.Tokenise(str).Select(s => s.ToLower())).ToArray();

Naive Bayes Baseline

Bright Wire includes two separate Naive Bayes classifiers that are designed expressly to classify text: Multinomial Naive Bayes and Bernoulli Naive Bayes. They are both simple, fast and give good results on a range of text classification tasks.

These classifiers both work on a tuple of a string and an IndexList - which is a list of unsigned integers representing the string indexes. IndexLists can repeat their indices.

var (training, test) = sentences.Shuffle(context.Random).ToArray().Split();
_indexedSentencesTraining = BuildIndexedClassifications(context, training, _stringTable);
_indexedSentencesTest = BuildIndexedClassifications(context, test, _stringTable);
_maxIndex = _indexedSentencesTraining.Concat(_indexedSentencesTest).Max(d => d!.Data.Indices.Max());
_context = context;

var bernoulli = _indexedSentencesTraining.TrainBernoulliNaiveBayes();
Console.WriteLine("Bernoulli accuracy: {0:P}", _indexedSentencesTest
    .Classify(bernoulli.CreateClassifier())
    .Average(r => r.Score)
);

var multinomial = _indexedSentencesTraining.TrainMultinomialNaiveBayes();
Console.WriteLine("Multinomial accuracy: {0:P}", _indexedSentencesTest
    .Classify(multinomial.CreateClassifier())
    .Average(r => r.Score)
);

Bright Wire includes a StringTableBuilder that converts words to string indexes (each unique word gets a unique string index starting from zero). This is done in the following function:

static (string Classification, IndexList Data)[] BuildIndexedClassifications(IBrightDataContext context, (string[], string)[] data, StringTableBuilder stringTable)
{
    return data
        .Select(d => (d.Item2, context.CreateIndexList(d.Item1.Select(stringTable.GetIndex).ToArray())))
        .ToArray()
    ;
}

The Bernoulli and Multinomial Naive Bayes classifiers get a score of around 83% and 84% respectively on the test data set.

BiLSTM for the Win

LSTM are another type of more sophisticated recurrent neural network that can be used interchangeably with GRU layers.

A bidirectional recurrent neural network is when we have two such recurrent neural layers that each read the entire sentence but in two different directions - one reads forward as the other reads backwards. The output from both are concatenated for each word.

The layer that is reading forwards has the last few words in its internal memory, while the layer that is reading backwards has the next few words in its internal memory.

Thus each word has a nearby (automatically determined) context of its surrounding words that may influence the word's meaning (given that words can mean different things in different contexts).

When two LSTM layers are combined into a bidirectional recurrent neural network this can be abbreviated as BiLSTM.

Word Embeddings

One way to represent a word in the recurrent neural network is to use a pretrained embedding, such as Glove.

The Bright Wire Data assembly contains a limited vocabulary of embeddings for use in this sentiment training example.

The complete set of embeddings can be downloaded.

Augmented Training Data

Another trick is to give the model as much information as possible about the classification domain. In this case we have already trained two simple naive bayes models and can combine their predictions into the data that will be considered by the BiLSTM model.

Each word is {embedding-size+2} in size, with the additional two float values formed by the output from each of the two previous models.

float[]? GetInputVector(float c1, float c2, string word)
{
    var embedding = Data.Embeddings.Get(word);
    if (embedding != null) {
        var ret = new float[embedding.Length + 2];
        Array.Copy(embedding, ret, embedding.Length);
        ret[embedding.Length] = c1;
        ret[embedding.Length + 1] = c2;
        return ret;
    }

    return null;
}

IRowOrientedDataTable CreateTable((string Classification, IndexList Data)[] data, IIndexListClassifier bernoulli, IIndexListClassifier multinomial)
{
    var builder = _context.BuildTable();
    builder.AddColumn(ColumnType.Matrix);
    builder.AddColumn(ColumnType.Matrix).SetTarget(true);

    var empty = new float[102];
    foreach (var row in data) {
        var c1 = bernoulli.Classify(row.Data).First().Label == "positive" ? 1f : 0f;
        var c2 = multinomial.Classify(row.Data).First().Label == "positive" ? 1f : 0f;
        var input = row.Data.Indices.Select(i => _context.CreateVector(GetInputVector(c1, c2, _stringTable.GetString(i)) ?? empty)).ToArray();
        var output = _context.CreateMatrix((uint)input.Length, 2, (i, j) => GetOutputValue(j, row.Classification == "positive"));
        
        builder.AddRow(_context.CreateMatrixFromRows(input), output);
    }

    return builder.BuildRowOriented();
}

Training the Network

A bidirectional network is created with a Bidirectional Join.

var graph = _context.CreateGraphFactory();
var trainingTable = CreateTable(_indexedSentencesTraining, bernoulli, multinomial);
var testTable = CreateTable(_indexedSentencesTest, bernoulli, multinomial);
var training = graph.CreateDataSource(trainingTable);
var test = training.CloneWith(testTable);
var errorMetric = graph.ErrorMetric.OneHotEncoding;
var engine = graph.CreateTrainingEngine(training, errorMetric, learningRate: 0.1f, batchSize: 128);

graph.CurrentPropertySet
    .Use(graph.Adam())
    .Use(graph.WeightInitialisation.Xavier)
;

// build the network
const int HIDDEN_LAYER_SIZE = 100;

var forward = graph.Connect(engine)
    .AddLstm(HIDDEN_LAYER_SIZE, "forward")
;
var reverse = graph.Connect(engine)
    .ReverseSequence()
    .AddLstm(HIDDEN_LAYER_SIZE, "backward")
;
graph.BidirectionalJoin(forward, reverse)
    .AddFeedForward(engine.DataSource.GetOutputSizeOrThrow(), "joined")
    .Add(graph.SigmoidActivation())
    .AddBackpropagationThroughTime()
;

The neural network ends up with an accuracy slightly higher 87% - about the same as the accuracy in the published article from which the training data was obtained and without applying the more sophisticated approach from that paper.

Evaluating the Classifiers

We've trained three different classifiers - now we can see what they think.

var empty = new float[102];
Console.WriteLine("Enter some text to test the classifiers...");
while (true)
{
    Console.Write(">");
    var line = Console.ReadLine();
    if (String.IsNullOrWhiteSpace(line))
        break;

    var tokens = Tokenise(line);
    var indices = new List<uint>();
    var embeddings = new List<float[]>();
    foreach (var token in tokens)
    {
        if (_stringTable.TryGetIndex(token, out uint stringIndex))
            indices.Add(stringIndex);
        embeddings.Add(GetInputVector(0, 0, token) ?? empty);
    }
    if (indices.Any())
    {
        var indexList = _context.CreateIndexList(indices);
        var bc = bernoulli.Classify(indexList).First().Label;
        var mc = multinomial.Classify(indexList).First().Label;
        Console.WriteLine("Bernoulli classification: " + bc);
        Console.WriteLine("Multinomial classification: " + mc);

        // add the other classifier results into the embedding
        foreach (var word in embeddings) {
            word[100] = bc == "positive" ? 1f : 0f;
            word[101] = mc == "positive" ? 1f : 0f;
        }

        foreach (var (token, result) in tokens.Zip(neuralNetwork.ExecuteSequential(embeddings.ToArray()), (t, r) => (Token: t, Result: r.Output.Single()))) {
            var label = result.Softmax().MaximumIndex() == 0 ? "positive" : "negative";
            Console.WriteLine($"{token}: {label}");
        }
    }
    else
        Console.WriteLine("Sorry, none of those words have been seen before.");
    Console.WriteLine();
}

Results

Complete Source Code

View the complete source on GitHub

Fork me on GitHub