For natural language processing, an RNN would encode the sentence “A black cat jumped on the table” as a sequence of seven vectors (x 1 , x 2 , … x 7 ), where each word would be represented as a single non-zero value in a sparse vector 2 (Goodfellow et al., 2016). For instance, if we train a model with a vocabulary of 100,000 words, the first word “A” in the sentence would be encoded as a sparse vector of 100,000 numerical values, all equal to 0, except the first (corresponding to the word “A”), which would be equal to 1. The word “black” would be encoded as a sparse vector of 100,000 zero's, except the 12,853rd element (corresponding to the word “black”) equal to 1, etc