The encoder is tasked with learning to generate a single embedding that effectively summarises the input and the decoder is tasked with learning to generate a sequence of output from that single embedding.
The first part of a STS architecture is the encoder that learns how to encode the most relevant parts of a sequence of input into a single embedding.
In this scenario, the encoder is learning to encode an input sequence into an embedding and the decoder is learning to decode that embedding into the same input sequence.
In a recurrent auto encoder the input and output sequence lengths are necessarily the same, but we are using the encoder's ability to find the relevant discriminative features of the input as it creates the single embedding from the input sequence.
This is how we might build a single embedding from a sequence of words (the document) for the purposes of document comparison.