... a naive method that splits the 1,000-long sequence into 50 sequences (say) each of length 20 and treats each sequence of length 20 as a separate training case. This is a sensible approach that can work well in practice, but it is blind to temporal dependencies that span more than 20 time steps.
— Training Recurrent Neural Networks, 2013
This means as part of framing your problem you must split long sequences into subsequences that are both long enough to capture [...] for making predictions, but short enough to efficiently train the network