Next, we connect a GRU and Feed Forward layer with Sigmoid activation and train the neural network with rms prop gradient descent, learning rate of 0.003 and batch size of 32 for 30 epochs.
RELU and Adam gradient descent optimisation seem to work well on this data.
The network is trained with Adam gradient descent and Gaussian weight initialisation for 20 epochs.
Next, the LSTM network is trained for 10 iterations using rms prop gradient descent and a hidden memory size of 128.
The network is trained with rms prop gradient descent and a mini batch size of 8.