Edited, memorised or added to reading list

on 28-Jun-2022 (Tue)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

#recurrent-neural-networks #rnn
The name, often shortened to seq2seq, comes from the fact that these models can translate a sequence of input elements into a sequence of outputs. Different seq2seq models can be created depending on how we manipulate the input data; i.e., we can conceal certain parts of the input sequence and train the model to predict what is missing, to ‘‘fill in the blanks”. If we always blank only the last element in a historical sequence, the model effectively learns to predict the most likely future, conditioned on the observed past. Applying this idea to customer transaction records, we can forecast sequences predicting future behavior. We next present our model architecture in detail
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




[unknown IMAGE 7101511240972] #has-images #recurrent-neural-networks #rnn

2.1. Model architecture

To forecast future customer behavior, our model is trained using individual sequences of past transaction events, i.e., chronological accounts of a customer’s lifetime. The example in Table 2 describes one such customer’s transaction history over seven consecutive discrete time periods. 4 This particular individual makes a transaction in the first week, followed by one week of inactivity, then transacting for two consecutive weeks, and so on; in weeks 3 and 4 they also received some form of a marketing appeal. The two calendar components – the month and week indicators – represent time-varying contextual information which is shared across the individuals within a given cohort. In addition, in this example we include also an indi- vidual time-invariant covariate (gender) and a time-varying, individual-level covariate (marketing appeals). This particular cus- tomer history can then be represented as a sequence of vectors with five elements: the input variable plus the four covariates. Individual-level covariates are strictly optional – in our empirical study, the Base model is built without any such variables. Whenever individual covariates are included, we label the model Extended. Note that the model is completely agnostic about further extensions: all individual-level, cohort-level, time-varying, or time-invariant covariates are simply encoded as categor- ical input variables, and are handled equally by the model. This property makes our model extremely flexible in dealing with diverse customer behaviors observed across multiple contexts and platforms

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




[unknown IMAGE 7101515435276] #has-images #recurrent-neural-networks #rnn
A schematic high-level representation of the proposed model architecture is shown in Fig. 2. The structure of the model begins with its input layers for (i) the input variable (i.e., transaction counts) and (ii) optional covariates (time-invariant or time-varying inputs). These variable inputs enter the model through dedicated input layers at the top of the model’s architecture and are combined by simply concatenating them into a single long vector. This input signal then propagates through a series of intermediate layers including a specialized LSTM, or Long Short-Term Memory RNN neural network component.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#recurrent-neural-networks #rnn
softmax and the output of the softmax layer at any given time step t is a k-tuple for the probability distribution across the k neurons of the output layer. We set the number of neurons k in the softmax layer to reflect the transaction counts observed across all individuals in the training data: as is the case with any ‘‘forward-looking” approach, the model can only learn from events that are observed at some point during estimation; i.e., if in the calibration period individuals only make between zero and three transactions during any of the discrete time periods, then a softmax layer with four neurons is sufficient: the neurons’ respective outputs represent the inferred probability of zero, one, two and three transactions. 10 With each vector read as input, the model’s training objective is to predict the target variable, which in this self- supervised training setup is just the input variable shifted by a single time step. Using the example from Table 2, given the sequence of input vectors starting with the first week of January, i.e. [1,January,1,F,0], [0,January,2,F,0], [1,Jan- uary,3,F,1] ..., we train the model to output the target sequence 0,1,1,...equal to the rightmost column in Table 2. With each input vector processed by the network, the internal memory component is trained to update a real-valued cell state vector to reflect the sequence of events thus far. We estimate the model parameters by minimizing the stochastic mini-batch 11 error between the predicted output and the actual target values. At the time of prediction, we fix the model parameters in the form of weights and biases between the individual neurons in the deep neural network, but the cell state vector built into the structure of the LSTM ‘‘memory” component is nonetheless being updated at each step with parts of the latest input, which helps the model learn very long-term transaction patterns. Each prediction is generated by drawing a sample from the multinomial output distribution calculated by the bottom network layer; our model therefore does not produce point or interval estimates, each output is a simulated draw 12 . Each time a draw from this multinomial distribution is made, the observation is fed back into the model as the new transaction variable input in order to generate the following time step prediction, and so on, until we create a sequence of predicted time steps of desired length. This so-called autoregressive mechanism in which an output value always becomes the new input is illustrated in Fig. 2 with the dotted arrow bending from the output layer back to the input. Fig. 2 also shows that we feed each input first into a dedicated embedding (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013) 13 layer. Using embeddings is not critical to our approach, but by creating efficient and dense (real-valued) vector representations of all variables it already serves to better separate useful signals from noise and to condense the information even before it reaches the memory component (see also Chamberlain, Cardoso, & A (2017) for a similar approach). It should be highlighted that this setup of inputs with associated embeddings is completely flexible and allows for the inclusion of any time-varying context or customer-specific static variables by simply adding more inputs together with their respective embedding layers
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#recurrent-neural-networks #rnn
Embedding layers are used to reduce data dimensionality, compressing large vectors of values into relatively smaller ones, to both reduce noise and limit the number of model parameters required
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




#recurrent-neural-networks #rnn

Each prediction is generated by drawing a sample from the multinomial output distribution calculated by the bottom network layer; our model therefore does not produce point or interval estimates, each output is a simulated draw.

To make the predicted transaction sequences robust against sampling noise, we repeat this process for each customer several times and take the mean expected number of transactions in a given time step as our final result. We describe how this benefits the prediction accuracy in the Appendix

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs