Jack Dermody

Gradient descent

GRU Recurrent Neural Networks

Next, we connect a GRU and Feed Forward layer with Sigmoid activation and train the neural network with rms prop gradient descent, learning rate of 0.003 and batch size of 32 for 30 epochs.

Sentiment Analysis

RELU and Adam gradient descent optimisation seem to work well on this data.

Convolutional Neural Networks

The network is trained with Adam gradient descent and Gaussian weight initialisation for 20 epochs.

Sequence to Sequence with LSTM

Next, the LSTM network is trained for 10 iterations using rms prop gradient descent and a hidden memory size of 128.

Classification Overview with Bright Wire

The network is trained with rms prop gradient descent and a mini batch size of 8.