This is when the network is divided into two separate classifiers, one called an encoder and the other the decoder.
The encoder is tasked with learning to generate a single embedding that effectively summarises the input and the decoder is tasked with learning to generate a sequence of output from that single embedding.
The first part of a STS architecture is the encoder that learns how to encode the most relevant parts of a sequence of input into a single embedding.
In this case the encoder just needs to keep track of all the characters that it has seen and write them into the output vector.
In this scenario, the encoder is learning to encode an input sequence into an embedding and the decoder is learning to decode that embedding into the same input sequence.
The simplest type of STS network is a recurrent auto encoder.
In a recurrent auto encoder the input and output sequence lengths are necessarily the same, but we are using the encoder's ability to find the relevant discriminative features of the input as it creates the single embedding from the input sequence.
Once the network has converged, we can throw the decoder away and use the encoder to create sequence embeddings.
In Bright Wire, the decoder and encoder are defined in two separate graphs that are stitched together to create the sequence to sequence architecture.