Jack Dermody

Iteration

Sequence to Sequence with LSTM

Next, the LSTM network is trained for 10 iterations using rms prop gradient descent and a hidden memory size of 128.

This is done by writing the encoder's memory state to a named memory slot on every iteration and then joining that memory with the encoder's output data in the decoder.