Limitations of using MLP for sequence predicting
This can work well on some problems, but it has 5 critical limitations.
Stateless .
MLPs learn a fixed function approximation. Any outputs that are conditional on the context of the input sequence must be generalized and frozen into the network weights.
Unaware of Temporal Structure.
Time steps are modelled as input features, meaning that the network has no explicit handling or understanding of the temporal structure or order between observations.
Messy Scaling .
For problems that require modeling multiple parallel input sequences, the number of input features increases as a factor of the size of the sliding window without any explicit separation of time steps of series.
Fixed Sized Inputs
The size of the sliding window is fixed and must be imposed on all inputs to the network.
Fixed Sized Outputs .
The size of the output is also fixed and any outputs that do not conform must be forced. MLPs do offer great capability for sequence prediction but still suffer from this key limitation of having to specify the scope of temporal dependence between observations explicitly upfront in the design of the model.
Sequences pose a challenge for [deep neural networks] because they require that the dimensionality of the inputs and outputs is known and fixed. — Sequence to Sequence Learning with Neural Networks, 2014 MLPs are a good starting point for modeling sequence prediction problems, but we now have better options.