BuboFlash - helps with learning

Edited, memorised or added to reading queue

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

Annotation 7101508095244

#recurrent-neural-networks #rnn

The name, often shortened to seq2seq, comes from the fact that these models can translate a sequence of input elements into a sequence of outputs. Different seq2seq models can be created depending on how we manipulate the input data; i.e., we can conceal certain parts of the input sequence and train the model to predict what is missing, to ‘‘fill in the blanks”. If we always blank only the last element in a historical sequence, the model effectively learns to predict the most likely future, conditioned on the observed past. Applying this idea to customer transaction records, we can forecast sequences predicting future behavior. We next present our model architecture in detail

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 7101509668108

[unknown IMAGE 7101511240972]

#has-images #recurrent-neural-networks #rnn

2.1. Model architecture

To forecast future customer behavior, our model is trained using individual sequences of past transaction events, i.e., chronological accounts of a customer’s lifetime. The example in Table 2 describes one such customer’s transaction history over seven consecutive discrete time periods. 4 This particular individual makes a transaction in the first week, followed by one week of inactivity, then transacting for two consecutive weeks, and so on; in weeks 3 and 4 they also received some form of a marketing appeal. The two calendar components – the month and week indicators – represent time-varying contextual information which is shared across the individuals within a given cohort. In addition, in this example, we include also an individual time-invariant covariate (gender) and a time-varying, individual-level covariate (marketing appeals). This particular customer history can then be represented as a sequence of vectors with five elements: the input variable plus the four covariates. Individual-level covariates are strictly optional – in our empirical study, the Base model is built without any such variables. Whenever individual covariates are included, we label the model Extended. Note that the model is completely agnostic about further extensions: all individual-level, cohort-level, time-varying, or time-invariant covariates are simply encoded as categorical input variables, and are handled equally by the model. This property makes our model extremely flexible in dealing with diverse customer behaviors observed across multiple contexts and platforms

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 7101513862412

[unknown IMAGE 7101515435276]

#has-images #recurrent-neural-networks #rnn

A schematic high-level representation of the proposed model architecture is shown in Fig. 2. The structure of the model begins with its input layers for (i) the input variable (i.e., transaction counts) and (ii) optional covariates (time-invariant or time-varying inputs). These variable inputs enter the model through dedicated input layers at the top of the model’s architecture and are combined by simply concatenating them into a single long vector. This input signal then propagates through a series of intermediate layers including a specialized LSTM, or Long Short-Term Memory RNN neural network component.

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 7101519105292

#recurrent-neural-networks #rnn

softmax and the output of the softmax layer at any given time step t is a k-tuple for the probability distribution across the k neurons of the output layer. We set the number of neurons k in the softmax layer to reflect the transaction counts observed across all individuals in the training data: as is the case with any ‘‘forward-looking” approach, the model can only learn from events that are observed at some point during estimation; i.e., if in the calibration period individuals only make between zero and three transactions during any of the discrete time periods, then a softmax layer with four neurons is sufficient: the neurons’ respective outputs represent the inferred probability of zero, one, two and three transactions. 10 With each vector read as input, the model’s training objective is to predict the target variable, which in this self- supervised training setup is just the input variable shifted by a single time step. Using the example from Table 2, given the sequence of input vectors starting with the first week of January, i.e. [1,January,1,F,0], [0,January,2,F,0], [1,Jan- uary,3,F,1] ..., we train the model to output the target sequence 0,1,1,...equal to the rightmost column in Table 2. With each input vector processed by the network, the internal memory component is trained to update a real-valued cell state vector to reflect the sequence of events thus far. We estimate the model parameters by minimizing the stochastic mini-batch 11 error between the predicted output and the actual target values. At the time of prediction, we fix the model parameters in the form of weights and biases between the individual neurons in the deep neural network, but the cell state vector built into the structure of the LSTM ‘‘memory” component is nonetheless being updated at each step with parts of the latest input, which helps the model learn very long-term transaction patterns. Each prediction is generated by drawing a sample from the multinomial output distribution calculated by the bottom network layer; our model therefore does not produce point or interval estimates, each output is a simulated draw 12 . Each time a draw from this multinomial distribution is made, the observation is fed back into the model as the new transaction variable input in order to generate the following time step prediction, and so on, until we create a sequence of predicted time steps of desired length. This so-called autoregressive mechanism in which an output value always becomes the new input is illustrated in Fig. 2 with the dotted arrow bending from the output layer back to the input. Fig. 2 also shows that we feed each input first into a dedicated embedding (Mikolov, Sutskever, Chen, Corrado, & Dean, 2013) 13 layer. Using embeddings is not critical to our approach, but by creating efficient and dense (real-valued) vector representations of all variables it already serves to better separate useful signals from noise and to condense the information even before it reaches the memory component (see also Chamberlain, Cardoso, & A (2017) for a similar approach). It should be highlighted that this setup of inputs with associated embeddings is completely flexible and allows for the inclusion of any time-varying context or customer-specific static variables by simply adding more inputs together with their respective embedding layers

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 7101520678156

#recurrent-neural-networks #rnn

Embedding layers are used to reduce data dimensionality, compressing large vectors of values into relatively smaller ones, to both reduce noise and limit the number of model parameters required

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Annotation 7101522251020

#recurrent-neural-networks #rnn

Each prediction is generated by drawing a sample from the multinomial output distribution calculated by the bottom network layer; our model therefore does not produce point or interval estimates, each output is a simulated draw.

To make the predicted transaction sequences robust against sampling noise, we repeat this process for each customer several times and take the mean expected number of transactions in a given time step as our final result. We describe how this benefits the prediction accuracy in the Appendix

status	not read	reprioritisations
last reprioritisation on		suggested re-reading day
started reading on		finished reading on

pdf

cannot see any pdfs

Edited, memorised or added to reading queue

on 28-Jun-2022 (Tue)

Annotation 7101508095244

pdf

Annotation 7101509668108

pdf

Annotation 7101513862412

pdf

Annotation 7101519105292

pdf

Annotation 7101520678156

pdf

Annotation 7101522251020

pdf