Bright Wire

Training Naive Bayes, Decision Tree, Random Forest, KNN, Multinomial Logistic Regression and Neural Network classifiers on the Iris data-set.

Installing from Nuget

The easiest way to get started with Bright Wire is to create a new .NET 4.6 console application and include Bright Wire

Something to Classify

The machine learning equivalent of Hello World is probably the Iris data-set, so lets use that.

// download the iris data set
byte[] data;
using (var client = new WebClient()) {
    data = client.DownloadData("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data");
}

Bright Wire includes a tabular data table that can be created directly from a StreamReader (assuming the stream reader contains valid CSV).

// parse the iris CSV into a data table
var dataTable = new StreamReader(new MemoryStream(data)).ParseCSV(',');

The Iris dataset contains five columns of data. The first four are sepal and petal measurements and the last column is the Iris class (Iris Setosa, Iris Versicolour or Iris Virginica).

We'll use all four measurements as features and the class label as the classification target.

// the last column is the classification target ("Iris-setosa", "Iris-versicolor", or "Iris-virginica")
var targetColumnIndex = dataTable.TargetColumnIndex = dataTable.ColumnCount - 1;

As usual, the best way to avoid over-fitting is to split the data into training and test sets. The default split point in Bright Wire is 80% for training and 20% for test after a random shuffle. Here we pass 0 as a random seed so the shuffle will always be the same.

// split the data table into training and test tables
var split = dataTable.Split(0);

Evaluating some Classifiers

Now that we have our data, let's do some machine learning! We'll start with a Naive Bayes classifier.

We'll train the classifier on the training data and evaluate it using the test data.

// train and evaluate a naive bayes classifier
var naiveBayes = split.Training.TrainNaiveBayes();
Console.WriteLine("Naive bayes accuracy: {0:P}", split.Test
    .Classify(naiveBayes.CreateClassifier())
    .Average(d => d.Row.GetField<string>(targetColumnIndex) == d.Classification ? 1.0 : 0.0)
);

The classification accuracy is around 97% from one of the simplest machine learning classifiers. Not bad. Let's move on to a decision tree classifier.

// train and evaluate a decision tree classifier
var decisionTree = split.Training.TrainDecisionTree();
Console.WriteLine("Decision tree accuracy: {0:P}", split.Test
    .Classify(decisionTree.CreateClassifier())
    .Average(d => d.Row.GetField<string>(targetColumnIndex) == d.Classification ? 1.0 : 0.0)
);

It turns out that both the Naive Bayes classifier and the Decision Tree get exactly the same accuracy - around 97%. The exact accuracy depends on which samples are split into in the training and test data sets after the shuffle.

It might be tempting to train on the entire data set which would easily take the accuracy to 100% rather than using the training and test splits. But That Would Be Bad (our goal is not to teach the computer the learning data, but rather to generalise on data it hasn't seen yet)

Let's try a random forest...

// train and evaluate a random forest classifier
var randomForest = split.Training.TrainRandomForest(500);
Console.WriteLine("Random forest accuracy: {0:P}", split.Test
    .Classify(randomForest.CreateClassifier())
    .Average(d => d.Row.GetField<string>(targetColumnIndex) == d.Classification ? 1.0 : 0.0)
);

The random forest accuracy is the same - around 97%. Is there a pattern here somewhere? ; )

Adding Linear Algebra

Naive Bayes and Decision Trees aren't linear algebra based machine learning algorithms (they don't use vectors or matrices). To use other algorithms we will need to create a linear algebra provider.

Bright Wire supports CPU based linear algebra. If you have a NVIDIA GPU you can also add GPU based linear algebra with the CUDA add on. Note that the standard version can take advantage of CPU based optimisation libraries such as the Math.Net Numerics MKL library.

However this is such a small data set that the CPU based linear algebra will be fine.

// fire up some linear algebra on the CPU
using (var lap = BrightWireProvider.CreateLinearAlgebra(false)) {

K Nearest Neighbours

K Nearest Neighbours compares each test example with the entire training set. Let's see how it goes...

// train and evaluate k nearest neighbours
var knn = split.Training.TrainKNearestNeighbours();
Console.WriteLine("K nearest neighbours accuracy: {0:P}", split.Test
    .Classify(knn.CreateClassifier(lap, 10))
    .Average(d => d.Row.GetField<string>(targetColumnIndex) == d.Classification ? 1.0 : 0.0)
);

The accuracy is the same - around 97%.

Multinomial Logistic Regression

Multinomial Logistic Regression is a similar story getting the same 97% accuracy.

// train and evaluate a mulitinomial logistic regression classifier
var logisticRegression = split.Training.TrainMultinomialLogisticRegression(lap, 500, 0.1f);
Console.WriteLine("Multinomial logistic regression accuracy: {0:P}", split.Test
    .Classify(logisticRegression.CreateClassifier(lap))
    .Average(d => d.Row.GetField<string>(targetColumnIndex) == d.Classification ? 1.0 : 0.0)
);

Neural Network

To train a neural network we'll first need to convert our data table into vectors. The default conversion is to convert continuous features (such as the first four columns of our data) into a Single (float) and to one-hot encode categorical features (such as the class label) into multiple values per category (in this case into 001, 010 or 100 depending on the class label).

The converter will convert each row in the table into two vectors, one for the network input and the other for the expected output - in this case the first vector will be length 4 (for the four features) and the second vector will be length 3 (for the three possible class labels).

// convert the data tables into linear algebra friendly training data providers
var trainingData = lap.NN.CreateTrainingDataProvider(split.Training);
var testData = lap.NN.CreateTrainingDataProvider(split.Test);

A basic neural network design is to have two feed forward layers separated by a drop out layer. The drop out layer helps to regularise the network.

In this case our first layer has 8 neurons and RELU activation and the final (output) layer has two neurons and sigmoid activation.

The network is trained with rms prop gradient descent and a mini batch size of 8.

// create a neural network graph factory
var graph = new GraphFactory(lap);

// the default data table -> vector conversion uses one hot encoding of the classification labels, so create a corresponding cost function
var errorMetric = graph.ErrorMetric.OneHotEncoding;

// create the property set (use rmsprop gradient descent optimisation)
graph.CurrentPropertySet
    .Use(graph.RmsProp())
;

// create the training and test data sources
var trainingData = graph.CreateDataSource(split.Training);
var testData = trainingData.CloneWith(split.Test);

// create a 4x8x3 neural network with relu and sigmoid activations
const int HIDDEN_LAYER_SIZE = 8;
var engine = graph.CreateTrainingEngine(trainingData, 0.01f, 8);
graph.Connect(engine)
    .AddFeedForward(HIDDEN_LAYER_SIZE)
    .Add(graph.ReluActivation())
    .AddDropOut(0.5f)
    .AddFeedForward(engine.DataSource.OutputSize)
    .Add(graph.SigmoidActivation())
    .AddBackpropagation(errorMetric)
  
;

The output vector will correspond to the network's classification of the input vector (the categories are indexed by the index of the maximum output value in the output vector).

The one-hot error metric will check how well the maximum index is corresponding to the labelled training data and the network will adjust itself accordingly.

// train the network
Console.WriteLine("Training a 4x8x3 neural network...");
engine.Train(500, testData, errorMetric, null, 50);

After 500 epochs of training the neural network is just as accurate as the other classifiers.

(To see a deep neural network that can reach a perfect score on this training set see the tutorial on deep feed forward networks with batch normalization and SELU but be warned that this is a case of textbook over fitting - the network has perfectly memorized the input).

Output

Summary

All classifiers did really well on this data set. But that's only because the data set is trivially easy to classify. With only 150 rows it's also very small. We can't say which classifier is "best" under these conditions.

The only valid conclusion is that Bright Wire makes it easy to try a suite of different classifiers in .NET ; )

Complete Source Code

View the complete source on GitHub.

Fork me on GitHub