# on 01-Jun-2024 (Sat)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

#### Annotation 7103908547852

 #feature-engineering #lstm #recurrent-neural-networks #rnn The learning mechanism of the recurrent neural network thus involves: (1) the forward propagation step where the cross- entropy loss is calculated; (2) the backpropagation step where the gradient of the parameters with respect to the loss is calculated; and finally, (3) the optimization algorithm, that changes the parameters of the RNN based on the gradient.
status not read

#### pdf

cannot see any pdfs

#### Annotation 7103996890380

 3.1.3 Practical Considerations When Scaling #deep-learning #keras #lstm #python #sequence 3.1.3 Practical Considerations When Scaling There are some practical considerations when scaling sequence data. Estimate Coefficients You can estimate coefficients (min and max values for normalization or mean and standard deviation for standardization) from the training data. Inspect these first-cut estimates and use domain knowledge or domain experts to help improve these estimates so that they will be usefully correct on all data in the future. Save Coefficients You will need to scale new data in the future in exactly the same way as the data used to train your model. Save the coefficients used to file and load them later when you need to scale new data when making predictions. Data Analysis Use data analysis to help you better understand your data. For example, a simple histogram can help you quickly get a feeling for the distribution of quantities to see if standardization would make sense. Scale Each Series If your problem has multiple series, treat each as a separate variable and in turn scale them separately. Here, scale refers to a choice of scaling procedure such as normalization or standardization. Scale At The Right Time It is important to apply any scaling transforms at the right time. For example, if you have a series of quantities that is non-stationary, it may be appropriate to scale after first making your data stationary. It would not be appropriate to scale the series after it has been transformed into a supervised learning problem as each column would be handled differently, which would be incorrect. Scale if in Doubt You probably do need to rescale your input and output variables. If in doubt, at least normalize your data.
status not read

#### pdf

cannot see any pdfs

#### Flashcard 7627469360396

Tags
#tensorflow #tensorflow-certificate
Question

Preprocessing data

ct = make_column_transformer(([...](dtype="int32"), ['Sex']), remainder="passthrough") #other columns unchangaed
ct.fit(X_train)
X_train_transformed = ct.transform(X_train)
X_test_transformed = ct.transform(X_test)
Answer
OneHotEncoder

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Preprocessing data ct = make_column_transformer((OneHotEncoder(dtype="int32"), ['Sex']), remainder="passthrough") #other columns unchangaed ct.fit(X_train) X_train_transformed = ct.transform(X_train) X_test_transformed = ct.transform(X_test)

#### Original toplevel document

TfC_01_ADDITIONAL_01_Abalone.ipynb
Preprocessing data ct = make_column_transformer((OneHotEncoder(dtype="int32"), ['Sex']), remainder="passthrough") #other columns unchangaed ct.fit(X_train) X_train_transformed = ct.transform(X_train) X_test_transformed = ct.transform(X_test) Predictions valuation_predicts = model.predict(X_valuation_transformed) (array([[ 9.441547], [10.451973], [10.48082 ], ..., [10.401164], [13.13452 ], [ 8.081818]], dtype=float32), (6041

#### Flashcard 7628313726220

Tags
#DAG #causal #edx #has-images #inference
[unknown IMAGE 7096178707724]
Question
As you may have already noticed, the case-control design selects individuals based on their outcome. Women who did develop cancer are [...] to be included in the study than women who did not develop cancer. Therefore, our causal graph will include a note for selection-- C-- an arrow from the outcome Y to C, and a box around C to indicate that the analysis is conditional on having been selected into the study, which means that we are only one arrow away from selection bias.
Answer
much more likely

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
As you may have already noticed, the case-control design selects individuals based on their outcome. Women who did develop cancer are much more likely to be included in the study than women who did not develop cancer. Therefore, our causal graph will include a note for selection-- C-- an arrow from the outcome Y to C, and a box around

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7628315823372

Tags
#causality #statistics
Question
Given that we have tools to measure association, how can we isolate causation? In other words, how can we ensure that the association we measure is causation, say, for measuring the causal effect of 𝑋 on 𝑌 ? Well, we can do that by ensuring that there is [...] association flowing between 𝑋 and 𝑌
Answer
no non-causal

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
solate causation? In other words, how can we ensure that the association we measure is causation, say, for measuring the causal effect of 𝑋 on 𝑌 ? Well, we can do that by ensuring that there is <span>no non-causal association flowing between 𝑋 and 𝑌 <span>

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7628317658380

 #recurrent-neural-networks #rnn non-contractual settings The specific challenge in such settings is to accurately and timely inform managers on the subtle distinction between a pending defection event (i.e., a customer stops doing business with the focal firm) and an extended period of inactivity of their customers, because possible marketing implications are completely different in each of these situations.
status not read

#### Parent (intermediate) annotation

Open it
n-contractual business settings is by definition unobserved by the firm and thus needs to be indirectly inferred from past transaction behavior (Reinartz & Kumar, 2000; Gupta et al., 2006). <span>The specific challenge in such settings is to accurately and timely inform managers on the subtle distinction between a pending defection event (i.e., a customer stops doing business with the focal firm) and an extended period of inactivity of their customers, because possible marketing implications are completely different in each of these situations. <span>

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7628322114828

 #feature-engineering #lstm #recurrent-neural-networks #rnn The HMM has N discrete hidden states (where N is typically small) and, therefore, has only log 2 (N) bits of information available to capture the sequence history (Brown & Hinton, 2001)
status not read

#### Parent (intermediate) annotation

Open it
The HMM has N discrete hidden states (where N is typically small) and, therefore, has only log 2 (N) bits of information available to capture the sequence history (Brown & Hinton, 2001). On the other hand, the RNN has distributed hidden states, which means that each input generally results in changes across all the hidden units of the RNN (Ming et al., 2017). RNNs comb

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7628357504268

 #recurrent-neural-networks #rnn Extended variants of the original (‘‘Buy ’Till You Die” (BTYD) )models (e.g., Zhang, Bradlow, & Small (2015), Platzer & Reutterer (2016), Reutterer, Platzer, & Schröder (2021)) improve predictive accuracy by incorporating more hand-crafted summary statistics of customer behavior. However, including customer covariates is cumbersome and an approach to account for time-varying covariates has only just recently been introduced by Bachmann, Meierer, and Näf (2021) at the cost of manual labeling and slower performance.
status not read

#### Parent (intermediate) annotation

Open it
f BTYD models – while ”alive”, customers make purchases until they drop out – gives these models robust predictive power, especially on the aggregate cohort level, and over a long time horizon. <span>Extended variants of the original models (e.g., Zhang, Bradlow, & Small (2015), Platzer & Reutterer (2016), Reutterer, Platzer, & Schröder (2021)) improve predictive accuracy by incorporating more hand-crafted summary statistics of customer behavior. However, including customer covariates is cumbersome and an approach to account for time-varying covariates has only just recently been introduced by Bachmann, Meierer, and Näf (2021) at the cost of manual labeling and slower performance. Even advanced BTYD models can be too restrictive to adequately capture diverse customer behaviors in different contexts and the derived forecasts present customer future in an oftentime

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7628360125708

 #deep-learning #keras #lstm #python #sequence Unfortunately, the range of contextual information that standard RNNs can access is in practice quite limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections.
status not read

#### Parent (intermediate) annotation

Open it
Unfortunately, the range of contextual information that standard RNNs can access is in practice quite limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections. This shortcoming ... referred to in the literature as the vanishing gradient problem ... Long Short-Term Memory (LSTM) is an RNN architecture specifically designed to address the vanish

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7628363009292

 #feature-engineering #lstm #recurrent-neural-networks #rnn The learning mechanism of the recurrent neural network thus involves: (1) the forward propagation step where the cross- entropy loss is calculated;
status not read

#### Parent (intermediate) annotation

Open it
The learning mechanism of the recurrent neural network thus involves: (1) the forward propagation step where the cross- entropy loss is calculated; (2) the backpropagation step where the gradient of the parameters with respect to the loss is calculated; and finally, (3) the optimization algorithm, that changes the parameters of the

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7628365106444

Tags
#deep-learning #keras #lstm #python #sequence
Question
Sequence-to-sequence prediction involves predicting an [...] given an input sequence. For example: Input Sequence: 1, 2, 3, 4, 5 Output Sequence: 6, 7, 8, 9, 1
Answer
output sequence

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Sequence-to-sequence prediction involves predicting an output sequence given an input sequence. For example: Input Sequence: 1, 2, 3, 4, 5 Output Sequence: 6, 7, 8, 9, 1

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7628366679308

Question
The algorithms generate predictive scores for each customer based on journey features. These scores allow the company to predict individual customer [...] and value outcomes such as revenue, loyalty, and cost to serve. More broadly, they allow CX leaders to assess the ROI for particular CX investments and directly tie CX initiatives to business outcomes
Answer
satisfaction

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The algorithms generate predictive scores for each customer based on journey features. These scores allow the company to predict individual customer satisfaction and value outcomes such as revenue, loyalty, and cost to serve. More broadly, they allow CX leaders to assess the ROI for particular CX investments and directly tie CX initiatives to bu

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7628368252172

Tags
#feature-engineering #lstm #recurrent-neural-networks #rnn
Question
The RNN processes the entire sequence of available data without having to [...] it into features.
Answer
summarize

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The RNN processes the entire sequence of available data without having to summarize it into features.

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7628370087180

Tags
#recurrent-neural-networks #rnn
Question
Embedding layers are used to reduce data dimensionality, compressing large vectors of values into relatively smaller ones, to both [...] and limit the number of model parameters required
Answer
reduce noise

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
Embedding layers are used to reduce data dimensionality, compressing large vectors of values into relatively smaller ones, to both reduce noise and limit the number of model parameters required

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7628371922188

Tags
#deep-learning #keras #lstm #python #sequence
Question
For a multiclass classification problem, the results may be in the form of an array of probabilities (assuming a one hot encoded output variable) that may need to be converted to a single class output prediction using the [...]() NumPy function.
Answer
argmax

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
ion problem, the results may be in the form of an array of probabilities (assuming a one hot encoded output variable) that may need to be converted to a single class output prediction using the <span>argmax() NumPy function. <span>

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7628373232908

Tags
#DAG #causal #edx
Question
So all these methods for confounding adjustment -- stratification, matching, inverse probability weighting, G-formula, G-estimation -- have two things in common. First, they require data on the [...] that block the backdoor path. If those data are available, then the choice of one of these methods over the others is often a matter of personal taste. Unless the treatment is time-varying -- then we have to go to G-methods
Answer
confounders

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
So all these methods for confounding adjustment -- stratification, matching, inverse probability weighting, G-formula, G-estimation -- have two things in common. First, they require data on the <span>confounders that block the backdoor path. If those data are available, then the choice of one of these methods over the others is often a matter of personal taste. Unless the treatment is time-vary

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7628374543628

Tags
#recurrent-neural-networks #rnn
Question
In this paper, we offer marketing analysts an alternative to these models by developing a deep learning based approach that does not rely on any ex-ante data [...] or feature engineering, but instead automatically detects behavioral dynamics like seasonality or changes in inter-event timing patterns by learning directly from the prior transaction history
Answer
labelling

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In this paper, we offer marketing analysts an alternative to these models by developing a deep learning based approach that does not rely on any ex-ante data labelling or feature engineering, but instead automatically detects behavioral dynamics like seasonality or changes in inter-event timing patterns by learning directly from the prior transaction

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7628375854348

 #recurrent-neural-networks #rnn Sarkar and De Bruyn (2021) demonstrate that a special RNN type can help marketing response modelers to benefit from the multitude of inter-temporal customer-firm interactions accompanying observed transaction flows for predicting the most likely next customer action. However, their approach is limited to single point, next-step predictions and to continue with such forecasts into the long-run one must estimate the new model repeatedly with each additional future time step
status not read

#### Parent (intermediate) annotation

Open it
ing purchasing intent. In a similar context, Toth, Tan, Di Fabbrizio, and Datta (2017) have shown that a mixture of RNNs can approximate several complex functions simultaneously. More recently, <span>Sarkar and De Bruyn (2021) demonstrate that a special RNN type can help marketing response modelers to benefit from the multitude of inter-temporal customer-firm interactions accompanying observed transaction flows for predicting the most likely next customer action. However, their approach is limited to single point, next-step predictions and to continue with such forecasts into the long-run one must estimate the new model repeatedly with each additional future time step <span>

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7628378213644

Tags
#DAG #causal #edx #has-images
[unknown IMAGE 7093205732620]
Question
In those cases, it is generally better [...] L, because even though adjusting for L will not eliminate all confounding by U, it will typically eliminate some of the confounding by U
Answer
to adjust for

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
In those cases, it is generally better to adjust for L, because even though adjusting for L will not eliminate all confounding by U, it will typically eliminate some of the confounding by U

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7628381359372

 #recurrent-neural-networks #rnn The name, often shortened to seq2seq, comes from the fact that these models can translate a sequence of input elements into a sequence of outputs.
status not read

#### Parent (intermediate) annotation

Open it
The name, often shortened to seq2seq, comes from the fact that these models can translate a sequence of input elements into a sequence of outputs. Different seq2seq models can be created depending on how we manipulate the input data; i.e., we can conceal certain parts of the input sequence and train the model to predict what is mi

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7628382932236

 #recurrent-neural-networks #rnn Different seq2seq models can be created depending on how we manipulate the input data; i.e., we can conceal certain parts of the input sequence and train the model to predict what is missing, to ‘‘fill in the blanks”.
status not read

#### Parent (intermediate) annotation

Open it
The name, often shortened to seq2seq, comes from the fact that these models can translate a sequence of input elements into a sequence of outputs. Different seq2seq models can be created depending on how we manipulate the input data; i.e., we can conceal certain parts of the input sequence and train the model to predict what is missing, to ‘‘fill in the blanks”. If we always blank only the last element in a historical sequence, the model effectively learns to predict the most likely future, conditioned on the observed past. Applying this idea t

#### Original toplevel document (pdf)

cannot see any pdfs