Edited, memorised or added to reading queue

on 01-Jun-2024 (Sat)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

#feature-engineering #lstm #recurrent-neural-networks #rnn

The learning mechanism of the recurrent neural network thus involves:

(1) the forward propagation step where the cross- entropy loss is calculated;

(2) the backpropagation step where the gradient of the parameters with respect to the loss is calculated; and finally,

(3) the optimization algorithm, that changes the parameters of the RNN based on the gradient.

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




3.1.3 Practical Considerations When Scaling
#deep-learning #keras #lstm #python #sequence

3.1.3 Practical Considerations When Scaling

There are some practical considerations when scaling sequence data.

Estimate Coefficients
You can estimate coefficients (min and max values for normalization or mean and standard deviation for standardization) from the training data. Inspect these first-cut estimates and use domain knowledge or domain experts to help improve these estimates so that they will be usefully correct on all data in the future.

Save Coefficients
You will need to scale new data in the future in exactly the same way as the data used to train your model. Save the coefficients used to file and load them later when you need to scale new data when making predictions.

Data Analysis
Use data analysis to help you better understand your data. For example, a simple histogram can help you quickly get a feeling for the distribution of quantities to see if standardization would make sense.

Scale Each Series
If your problem has multiple series, treat each as a separate variable and in turn scale them separately. Here, scale refers to a choice of scaling procedure such as normalization or standardization.

Scale At The Right Time
It is important to apply any scaling transforms at the right time. For example, if you have a series of quantities that is non-stationary, it may be appropriate to scale after first making your data stationary. It would not be appropriate to scale the series after it has been transformed into a supervised learning problem as each column would be handled differently, which would be incorrect.

Scale if in Doubt
You probably do need to rescale your input and output variables. If in doubt, at least normalize your data.

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 7627469360396

Tags
#tensorflow #tensorflow-certificate
Question

Preprocessing data

ct = make_column_transformer(([...](dtype="int32"), ['Sex']), remainder="passthrough") #other columns unchangaed
ct.fit(X_train) 
X_train_transformed = ct.transform(X_train)
X_test_transformed = ct.transform(X_test)
Answer
OneHotEncoder

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Preprocessing data ct = make_column_transformer((OneHotEncoder(dtype="int32"), ['Sex']), remainder="passthrough") #other columns unchangaed ct.fit(X_train) X_train_transformed = ct.transform(X_train) X_test_transformed = ct.transform(X_test)

Original toplevel document

TfC_01_ADDITIONAL_01_Abalone.ipynb
Preprocessing data ct = make_column_transformer((OneHotEncoder(dtype="int32"), ['Sex']), remainder="passthrough") #other columns unchangaed ct.fit(X_train) X_train_transformed = ct.transform(X_train) X_test_transformed = ct.transform(X_test) Predictions valuation_predicts = model.predict(X_valuation_transformed) (array([[ 9.441547], [10.451973], [10.48082 ], ..., [10.401164], [13.13452 ], [ 8.081818]], dtype=float32), (6041







Flashcard 7628313726220

Tags
#DAG #causal #edx #has-images #inference
[unknown IMAGE 7096178707724]
Question
As you may have already noticed, the case-control design selects individuals based on their outcome. Women who did develop cancer are [...] to be included in the study than women who did not develop cancer. Therefore, our causal graph will include a note for selection-- C-- an arrow from the outcome Y to C, and a box around C to indicate that the analysis is conditional on having been selected into the study, which means that we are only one arrow away from selection bias.
Answer
much more likely

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
As you may have already noticed, the case-control design selects individuals based on their outcome. Women who did develop cancer are much more likely to be included in the study than women who did not develop cancer. Therefore, our causal graph will include a note for selection-- C-- an arrow from the outcome Y to C, and a box around

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7628315823372

Tags
#causality #statistics
Question
Given that we have tools to measure association, how can we isolate causation? In other words, how can we ensure that the association we measure is causation, say, for measuring the causal effect of 𝑋 on 𝑌 ? Well, we can do that by ensuring that there is [...] association flowing between 𝑋 and 𝑌
Answer
no non-causal

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
solate causation? In other words, how can we ensure that the association we measure is causation, say, for measuring the causal effect of 𝑋 on 𝑌 ? Well, we can do that by ensuring that there is <span>no non-causal association flowing between 𝑋 and 𝑌 <span>

Original toplevel document (pdf)

cannot see any pdfs







#recurrent-neural-networks #rnn

non-contractual settings

The specific challenge in such settings is to accurately and timely inform managers on the subtle distinction between a pending defection event (i.e., a customer stops doing business with the focal firm) and an extended period of inactivity of their customers, because possible marketing implications are completely different in each of these situations.

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
n-contractual business settings is by definition unobserved by the firm and thus needs to be indirectly inferred from past transaction behavior (Reinartz & Kumar, 2000; Gupta et al., 2006). <span>The specific challenge in such settings is to accurately and timely inform managers on the subtle distinction between a pending defection event (i.e., a customer stops doing business with the focal firm) and an extended period of inactivity of their customers, because possible marketing implications are completely different in each of these situations. <span>

Original toplevel document (pdf)

cannot see any pdfs




#feature-engineering #lstm #recurrent-neural-networks #rnn
The HMM has N discrete hidden states (where N is typically small) and, therefore, has only log 2 (N) bits of information available to capture the sequence history (Brown & Hinton, 2001)
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
The HMM has N discrete hidden states (where N is typically small) and, therefore, has only log 2 (N) bits of information available to capture the sequence history (Brown & Hinton, 2001). On the other hand, the RNN has distributed hidden states, which means that each input generally results in changes across all the hidden units of the RNN (Ming et al., 2017). RNNs comb

Original toplevel document (pdf)

cannot see any pdfs




#recurrent-neural-networks #rnn
Extended variants of the original (‘‘Buy ’Till You Die” (BTYD) )models (e.g., Zhang, Bradlow, & Small (2015), Platzer & Reutterer (2016), Reutterer, Platzer, & Schröder (2021)) improve predictive accuracy by incorporating more hand-crafted summary statistics of customer behavior. However, including customer covariates is cumbersome and an approach to account for time-varying covariates has only just recently been introduced by Bachmann, Meierer, and Näf (2021) at the cost of manual labeling and slower performance.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
f BTYD models – while ”alive”, customers make purchases until they drop out – gives these models robust predictive power, especially on the aggregate cohort level, and over a long time horizon. <span>Extended variants of the original models (e.g., Zhang, Bradlow, & Small (2015), Platzer & Reutterer (2016), Reutterer, Platzer, & Schröder (2021)) improve predictive accuracy by incorporating more hand-crafted summary statistics of customer behavior. However, including customer covariates is cumbersome and an approach to account for time-varying covariates has only just recently been introduced by Bachmann, Meierer, and Näf (2021) at the cost of manual labeling and slower performance. Even advanced BTYD models can be too restrictive to adequately capture diverse customer behaviors in different contexts and the derived forecasts present customer future in an oftentime

Original toplevel document (pdf)

cannot see any pdfs




#deep-learning #keras #lstm #python #sequence
Unfortunately, the range of contextual information that standard RNNs can access is in practice quite limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
Unfortunately, the range of contextual information that standard RNNs can access is in practice quite limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections. This shortcoming ... referred to in the literature as the vanishing gradient problem ... Long Short-Term Memory (LSTM) is an RNN architecture specifically designed to address the vanish

Original toplevel document (pdf)

cannot see any pdfs




#feature-engineering #lstm #recurrent-neural-networks #rnn

The learning mechanism of the recurrent neural network thus involves:

(1) the forward propagation step where the cross- entropy loss is calculated;

statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
The learning mechanism of the recurrent neural network thus involves: (1) the forward propagation step where the cross- entropy loss is calculated; (2) the backpropagation step where the gradient of the parameters with respect to the loss is calculated; and finally, (3) the optimization algorithm, that changes the parameters of the

Original toplevel document (pdf)

cannot see any pdfs




Flashcard 7628365106444

Tags
#deep-learning #keras #lstm #python #sequence
Question
Sequence-to-sequence prediction involves predicting an [...] given an input sequence. For example: Input Sequence: 1, 2, 3, 4, 5 Output Sequence: 6, 7, 8, 9, 1
Answer
output sequence

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Sequence-to-sequence prediction involves predicting an output sequence given an input sequence. For example: Input Sequence: 1, 2, 3, 4, 5 Output Sequence: 6, 7, 8, 9, 1

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7628366679308

Question
The algorithms generate predictive scores for each customer based on journey features. These scores allow the company to predict individual customer [...] and value outcomes such as revenue, loyalty, and cost to serve. More broadly, they allow CX leaders to assess the ROI for particular CX investments and directly tie CX initiatives to business outcomes
Answer
satisfaction

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
The algorithms generate predictive scores for each customer based on journey features. These scores allow the company to predict individual customer satisfaction and value outcomes such as revenue, loyalty, and cost to serve. More broadly, they allow CX leaders to assess the ROI for particular CX investments and directly tie CX initiatives to bu

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7628368252172

Tags
#feature-engineering #lstm #recurrent-neural-networks #rnn
Question
The RNN processes the entire sequence of available data without having to [...] it into features.
Answer
summarize

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
The RNN processes the entire sequence of available data without having to summarize it into features.

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7628370087180

Tags
#recurrent-neural-networks #rnn
Question
Embedding layers are used to reduce data dimensionality, compressing large vectors of values into relatively smaller ones, to both [...] and limit the number of model parameters required
Answer
reduce noise

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Embedding layers are used to reduce data dimensionality, compressing large vectors of values into relatively smaller ones, to both reduce noise and limit the number of model parameters required

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7628371922188

Tags
#deep-learning #keras #lstm #python #sequence
Question
For a multiclass classification problem, the results may be in the form of an array of probabilities (assuming a one hot encoded output variable) that may need to be converted to a single class output prediction using the [...]() NumPy function.
Answer
argmax

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
ion problem, the results may be in the form of an array of probabilities (assuming a one hot encoded output variable) that may need to be converted to a single class output prediction using the <span>argmax() NumPy function. <span>

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7628373232908

Tags
#DAG #causal #edx
Question
So all these methods for confounding adjustment -- stratification, matching, inverse probability weighting, G-formula, G-estimation -- have two things in common. First, they require data on the [...] that block the backdoor path. If those data are available, then the choice of one of these methods over the others is often a matter of personal taste. Unless the treatment is time-varying -- then we have to go to G-methods
Answer
confounders

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
So all these methods for confounding adjustment -- stratification, matching, inverse probability weighting, G-formula, G-estimation -- have two things in common. First, they require data on the <span>confounders that block the backdoor path. If those data are available, then the choice of one of these methods over the others is often a matter of personal taste. Unless the treatment is time-vary

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7628374543628

Tags
#recurrent-neural-networks #rnn
Question
In this paper, we offer marketing analysts an alternative to these models by developing a deep learning based approach that does not rely on any ex-ante data [...] or feature engineering, but instead automatically detects behavioral dynamics like seasonality or changes in inter-event timing patterns by learning directly from the prior transaction history
Answer
labelling

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
In this paper, we offer marketing analysts an alternative to these models by developing a deep learning based approach that does not rely on any ex-ante data labelling or feature engineering, but instead automatically detects behavioral dynamics like seasonality or changes in inter-event timing patterns by learning directly from the prior transaction

Original toplevel document (pdf)

cannot see any pdfs







#recurrent-neural-networks #rnn
Sarkar and De Bruyn (2021) demonstrate that a special RNN type can help marketing response modelers to benefit from the multitude of inter-temporal customer-firm interactions accompanying observed transaction flows for predicting the most likely next customer action. However, their approach is limited to single point, next-step predictions and to continue with such forecasts into the long-run one must estimate the new model repeatedly with each additional future time step
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
ing purchasing intent. In a similar context, Toth, Tan, Di Fabbrizio, and Datta (2017) have shown that a mixture of RNNs can approximate several complex functions simultaneously. More recently, <span>Sarkar and De Bruyn (2021) demonstrate that a special RNN type can help marketing response modelers to benefit from the multitude of inter-temporal customer-firm interactions accompanying observed transaction flows for predicting the most likely next customer action. However, their approach is limited to single point, next-step predictions and to continue with such forecasts into the long-run one must estimate the new model repeatedly with each additional future time step <span>

Original toplevel document (pdf)

cannot see any pdfs




Flashcard 7628378213644

Tags
#DAG #causal #edx #has-images
[unknown IMAGE 7093205732620]
Question
In those cases, it is generally better [...] L, because even though adjusting for L will not eliminate all confounding by U, it will typically eliminate some of the confounding by U
Answer
to adjust for

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
In those cases, it is generally better to adjust for L, because even though adjusting for L will not eliminate all confounding by U, it will typically eliminate some of the confounding by U

Original toplevel document (pdf)

cannot see any pdfs







#recurrent-neural-networks #rnn
The name, often shortened to seq2seq, comes from the fact that these models can translate a sequence of input elements into a sequence of outputs.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
The name, often shortened to seq2seq, comes from the fact that these models can translate a sequence of input elements into a sequence of outputs. Different seq2seq models can be created depending on how we manipulate the input data; i.e., we can conceal certain parts of the input sequence and train the model to predict what is mi

Original toplevel document (pdf)

cannot see any pdfs




#recurrent-neural-networks #rnn
Different seq2seq models can be created depending on how we manipulate the input data; i.e., we can conceal certain parts of the input sequence and train the model to predict what is missing, to ‘‘fill in the blanks”.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
The name, often shortened to seq2seq, comes from the fact that these models can translate a sequence of input elements into a sequence of outputs. Different seq2seq models can be created depending on how we manipulate the input data; i.e., we can conceal certain parts of the input sequence and train the model to predict what is missing, to ‘‘fill in the blanks”. If we always blank only the last element in a historical sequence, the model effectively learns to predict the most likely future, conditioned on the observed past. Applying this idea t

Original toplevel document (pdf)

cannot see any pdfs