## Gradient descent

### GRU Recurrent Neural Networks

Next, we connect a GRU and Feed Forward layer with Sigmoid activation and train the neural network with rms prop gradient descent, learning rate of 0.003 and batch size of 32 for 30 epochs.

### Sentiment Analysis

RELU and Adam gradient descent optimisation seem to work well on this data.

### Convolutional Neural Networks

The network is trained with Adam gradient descent and Gaussian weight initialisation for 20 epochs.

### Sequence to Sequence with LSTM

Next, the LSTM network is trained for 10 iterations using rms prop gradient descent and a hidden memory size of 128.

### Classification Overview with Bright Wire

The network is trained with rms prop gradient descent and a mini batch size of 8.