Edited, memorised or added to reading queue

on 24-Apr-2024 (Wed)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

#data-science #infrastructure
A typical bottleneck is caused by the fact that humans can’t deliver software (or hardware, if operating outside the cloud) fast enough. Even if they were capable of hacking code fast enough, they may be busy maintaining existing systems, which is another critically human activity. This observation helps us to realize that although “infrastructure” sounds very technical, we are not building infrastructure for the machines. We are building infrastructure to make humans more productive. This realization has fundamental ramifications to how we should think about and design infrastructure for data scientists— for fellow human beings, instead of for machines. For instance, if we assume that human-time is more expensive than computer-time, which is certainly true for most data scientists, it makes sense to use a highly expressive, productivity-boosting language like Python instead of a low-level language like C++, even if it makes workloads more inefficient to process. We will dig deeper into this question in chapter 5
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

pdf

cannot see any pdfs




Flashcard 7625116880140

Tags
#tensorflow #tensorflow-certificate
Question

# Create 4-[...] tensor (the same as 4 dimensions)

A = tf.constant(np.arange(0, 120), shape=(2, 3, 4, 5))

A

Answer
rank

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Open it
# Create 4-rank tensor (the same as 4 dimensions) A = tf.constant(np.arange(0, 120), shape=(2, 3, 4, 5)) A







#RNN #ariadne #behaviour #consumer #deep-learning #patterns #priority #recurrent-neural-networks #retail #simulation #synthetic-data
The model utilizes an auto-encoder to represent features of input parameters (i.e. customer loyalty number, R, F, and M). The proposed model is the first of its kind in the literature and has many opportunities for further improvement. The model can be improved by using more training data. It is interesting to explore deeper structures of the model in auto- encoder and recursion levels. Clumpiness is another variable which can be studied as an additive to R, F, and M (i.e. RFMC) variables.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
This paper proposes a new model for RFM prediction of customers based on recurrent neural networks (RNNs) with rectified linear unit activation function. The model utilizes an auto-encoder to represent features of input parameters (i.e. customer loyalty number, R, F, and M). The proposed model is the first of its kind in the literature and has many opportunities for further improvement. The model can be improved by using more training data. It is interesting to explore deeper structures of the model in auto- encoder and recursion levels. Clumpiness is another variable which can be studied as an additive to R, F, and M (i.e. RFMC) variables. Another pathway is considering other parameters of user (e.g. location, age, and etc.) for automatic feature extraction and further development of recommender systems.

Original toplevel document (pdf)

cannot see any pdfs




[unknown IMAGE 7100426751244] #has-images #recurrent-neural-networks #rnn
What would we expect from customers like the first ten individuals 1001–1010, who started out as occasional benefactors, but through an evolving relationship with the firm have developed a more regular transaction behavior? Will they continue this trend; will they eventually turn into the firm’s premium customers?
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
What would we expect from customers like the first ten individuals 1001–1010, who started out as occasional benefactors, but through an evolving relationship with the firm have developed a more regular transaction behavior? Will they continue this trend; will they eventually turn into the firm’s premium customers? Conversely, how about the next ten individuals 1011–1020, who have all made a number of transactions historically, but recently have been on an unusually long hiatus? Is the customer-fi

Original toplevel document (pdf)

cannot see any pdfs




#ML-engineering #ML_in_Action #learning #machine #software-engineering
Project scoping for ML is incredibly challenging. Even for the most seasoned ML veterans, conjecturing how long a project will take, which approach is going to be most successful, and the amount of resources required is a futile and frustrating exercise
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
Project scoping for ML is incredibly challenging. Even for the most seasoned ML veterans, conjecturing how long a project will take, which approach is going to be most successful, and the amount of resources required is a futile and frustrating exercise. The risk associated with making erroneous claims is fairly high, but structuring proper scoping and solution research can help minimize the chances of being wildly off on estimation. <

Original toplevel document (pdf)

cannot see any pdfs




Flashcard 7625192639756

Tags
#feature-engineering #lstm #recurrent-neural-networks #rnn
Question
The LSTM neural network would be well-suited for modeling online customer behavior across multiple websites since it can naturally capture inter-sequence and inter-temporal interactions from multiple streams of clickstream data without growing [...] in complexity.
Answer
exponentially

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
odeling online customer behavior across multiple websites since it can naturally capture inter-sequence and inter-temporal interactions from multiple streams of clickstream data without growing <span>exponentially in complexity. <span>

Original toplevel document (pdf)

cannot see any pdfs







#recurrent-neural-networks #rnn
The simple behavioral story which sits at the core of BTYD models – while ”alive”, customers make purchases until they drop out
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
The simple behavioral story which sits at the core of BTYD models – while ”alive”, customers make purchases until they drop out – gives these models robust predictive power, especially on the aggregate cohort level, and over a long time horizon.

Original toplevel document (pdf)

cannot see any pdfs




Flashcard 7625197620492

Tags
#data-science #infrastructure
Question
to conduct data science projects, a common infrastructure can help to increase the number of projects that can be executed simultaneously (volume), speed up the time to market ([...]), ensure that the results are robust (validity), and make it possible to support a larger variety of projects
Answer
velocity

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
to conduct data science projects, a common infrastructure can help to increase the number of projects that can be executed simultaneously (volume), speed up the time to market (velocity), ensure that the results are robust (validity), and make it possible to support a larger variety of projects

Original toplevel document (pdf)

cannot see any pdfs







#data #synthetic
Traditional methods of synthetic data generation use techniques that do not intend to replicate important statistical properties of the original data. Properties such as the distribution, the patterns or the correlation between variables, are often omitted. Moreover, most of the existing tools and approaches require a great deal of user-defined rules and do not make use of advanced techniques like Machine Learning or Deep Learning
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
Traditional methods of synthetic data generation use techniques that do not intend to replicate important statistical properties of the original data. Properties such as the distribution, the patterns or the correlation between variables, are often omitted. Moreover, most of the existing tools and approaches require a great deal of user-defined rules and do not make use of advanced techniques like Machine Learning or Deep Learning. While Machine Learning is an innovative area of Artificial Intelligence and Computer Science that uses statistical techniques to give computers the ability to learn from data, Deep Lea

Original toplevel document (pdf)

cannot see any pdfs




#recurrent-neural-networks #rnn
We propose and implement a flexible methodological framework that provides marketing managers with highly accurate forecasts of fine granularity both in the short and in the long run. Our method also captures seasonal peaks and customer-level dynamics and allows to differentiate between different customer groups
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
Army knife-like) general-purpose problem solver that generalizes across the described decision tasks of managing customer relationships. This article makes a first step towards this direction. <span>We propose and implement a flexible methodological framework that provides marketing managers with highly accurate forecasts of fine granularity both in the short and in the long run. Our method also captures seasonal peaks and customer-level dynamics and allows to differentiate between different customer groups <span>

Original toplevel document (pdf)

cannot see any pdfs




Flashcard 7625203387660

Tags
#recurrent-neural-networks #rnn
Question
In this specific domain of customer base analysis, probabilistic approaches from the [...] model family represent the gold standard, leveraging easily observable Recency and Frequency (RF, or RFM when including also the monetary value) metrics together with a latent attrition process to deliver accurate predictions (Schmittlein, Morrison, & Colombo, 1987; Fader, Hardie, & Lee, 2005; Fader & Hardie, 2009)
Answer
‘‘Buy ’Till You Die” (BTYD)

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
In this specific domain of customer base analysis, probabilistic approaches from the ‘‘Buy ’Till You Die” (BTYD) model family represent the gold standard, leveraging easily observable Recency and Frequency (RF, or RFM when including also the monetary value) metrics together with a latent attrition

Original toplevel document (pdf)

cannot see any pdfs







#RNN #ariadne #behaviour #consumer #deep-learning #priority #recurrent-neural-networks #retail #simulation #synthetic-data
Applying RNNs directly to sequences of consumer actions yields the same or higher prediction accuracy than vector-based methods like logistic regression.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
In multiple aspects, RNNs offer advantages over existing methods that are relevant for real-world production systems. Applying RNNs directly to sequences of consumer actions yields the same or higher prediction accuracy than vector-based methods like logistic regression. Unlike the latter, the application of RNNs comes without the need for extensive feature engineering. In addition, we show that RNNs help us link individual actions directly to predictio

Original toplevel document (pdf)

cannot see any pdfs




Flashcard 7625206795532

Tags
#bayesian #stan
Question
The Stan development crew has made it easy to interactively explore diagnostics via the shinystan package, and one should do so with each model. In addition, there are other diagnostics available in other packages like [...] and posterior.
Answer
loo

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Stan - diagnostic packages
has made it easy to interactively explore diagnostics via the shinystan package, and one should do so with each model. In addition, there are other diagnostics available in other packages like <span>loo and posterior. <span>







Flashcard 7625208368396

Tags
#RNN #ariadne #behaviour #consumer #deep-learning #priority #recurrent-neural-networks #retail #simulation #synthetic-data
Question
As [...] are required directly in many practical applications, we use NLL also for evaluation. In some applications, the resulting ranking of consumers is more important than the probabilities themselves. For this reason, we also report the area under the ROC curve (AUC)
Answer
probability estimates

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
As probability estimates are required directly in many practical applications, we use NLL also for evaluation. In some applications, the resulting ranking of consumers is more important than the probabilities t

Original toplevel document (pdf)

cannot see any pdfs