Jack Dermody

Data set

Text Clustering Four Ways

The data set is useful as it contains multiple levels of keywords.

Sentiment Analysis

For this tutorial we are combining them into one large data set and then splitting that combined data set into training and test data sets.

There are three data files in this data set.

Bright Wire includes a simple tokeniser that is being called for each line in the combined data set.

Convolutional Neural Networks

Further accuracy should be obtained by augmenting the training data set, as described in Neural Networks and Deep Learning.

Recognising Handwritten Digits (MNIST)

The MNIST data set is a classic handwritten digit recognition data set.

Sequence to Sequence with LSTM

We can create training and test data sets with a SequenceClassification helper class in Bright Wire.

A one to many training data set is created with a data table that contains a vector input column and matrix output column.

Classification Overview with Bright Wire

The Iris dataset contains five columns of data.

The exact accuracy depends on which samples are split into in the training and test data sets after the shuffle.

It might be tempting to train on the entire data set which would easily take the accuracy to 100% rather than using the training and test splits.

But that's only because the data set is trivially easy to classify.

All classifiers did really well on this data set.