Edited, memorised or added to reading queue

on 23-Jan-2026 (Fri)

Do you want BuboFlash to help you learning these things? Click here to log in or create user.

#Inference #causal #reading
the artificial intelligence (AI) literature has developed a wide array of techniques for causal learning that allow leveraging information from various imperfect, heterogeneous, and biased data sources (Bareinboim and Pearl, 2016). See also https://causalai-book.net/
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
Building on the structural approach to causality introduced by Haavelmo (1943) and the graph-theoretic framework proposed by Pearl (1995), the artificial intelligence (AI) literature has developed a wide array of techniques for causal learning that allow leveraging information from various imperfect, heterogeneous, and biased data sources (Bareinboim and Pearl, 2016)

Original toplevel document (pdf)

cannot see any pdfs




#RNN #ariadne #behaviour #consumer #deep-learning #priority #recurrent-neural-networks #retail #simulation #synthetic-data
In principle, one could evaluate the logistic regression model at every single time-step in the consumer history to determine the influence of individual events. However, this would involve the inefficient process of re-calculating features for every time-step.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
In principle, one could evaluate the logistic regression model at every single time-step in the consumer history to determine the influence of individual events. However, this would involve the inefficient process of re-calculating features for every time-step. Calculations at timesteps t and t − 1 would be highly redundant: features at t represent the complete history until t and not only what happened in between t − 1 and t.

Original toplevel document (pdf)

cannot see any pdfs




Flashcard 7789511838988

Tags
#Inference #causal #reading
Question
the artificial intelligence (AI) literature has developed a wide array of techniques for causal learning that allow leveraging information from various imperfect, heterogeneous, and biased data sources ([...] and Pearl, 2016)
Answer
Bareinboim

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
icial intelligence (AI) literature has developed a wide array of techniques for causal learning that allow leveraging information from various imperfect, heterogeneous, and biased data sources (<span>Bareinboim and Pearl, 2016) <span>

Original toplevel document (pdf)

cannot see any pdfs







#R #ggplot2
Marginal distributions can now be made in R using ggside, a new ggplot2 extension
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on

Side-Plot Tutorial with ggside
Marginal distributions can now be made in R using ggside, a new ggplot2 extension. You can make linear regression with marginal distributions using histograms, densities, box plots, and more. Bonus - The side panels are super customizable for uncovering complex relat




Flashcard 7789515246860

Tags
#R #ggplot2
Question
[...] can now be made in R using ggside, a new ggplot2 extension
Answer
Marginal distributions

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Marginal distributions can now be made in R using ggside, a new ggplot2 extension

Original toplevel document

Side-Plot Tutorial with ggside
Marginal distributions can now be made in R using ggside, a new ggplot2 extension. You can make linear regression with marginal distributions using histograms, densities, box plots, and more. Bonus - The side panels are super customizable for uncovering complex relat







Flashcard 7789517081868

Tags
#recurrent-neural-networks #rnn
Question
We show that incorporating contextual information in the model is straightforward and brings an additional boost in predictive accuracy. However, the model performance is already extremely strong when no context is available beyond the [...] of the customer’s transactions. This is welcome news for firms that do not wish to collect personal information on principle, to avoid the questionable ethics of harvesting the ‘‘behavioral surplus”
Answer
timing

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
rmation in the model is straightforward and brings an additional boost in predictive accuracy. However, the model performance is already extremely strong when no context is available beyond the <span>timing of the customer’s transactions. This is welcome news for firms that do not wish to collect personal information on principle, to avoid the questionable ethics of harvesting the ‘‘behavi

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7789519441164

Tags
#feature-engineering #lstm #recurrent-neural-networks #rnn
Question
models with [...] capacity may overfit the training set and exhibit high variance
Answer
high

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
models with high capacity may overfit the training set and exhibit high variance

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7789521014028

Tags
#feature-engineering #lstm #recurrent-neural-networks #rnn
Question
models with high capacity may overfit the training set and exhibit [...] variance
Answer
high

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
models with high capacity may overfit the training set and exhibit high variance

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7789522586892

Tags
#tensorflow #tensorflow-certificate
Question

Bag of tricks to improve model

3. Fit the model - more epochs, more [...]

Answer
data examples

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Bag of tricks to improve model 3. Fit the model - more epochs, more data examples

Original toplevel document

TfC_02_classification-PART_1
nse(10, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.binary_crossentropy, metrics=['accuracy']) <span>Bag of tricks to improve model Create model - more layers, more neurons, different activation Compile mode - other loss, other optimizer, change optimizer parameters Fit the model - more epochs, more data examples # plots model predictions agains true data import numpy as np def plot_decision_boundry(model, X, y): """ Take in a trained model, features and labels and create numpy.meshgrid of the d







#RNN #ariadne #behaviour #consumer #deep-learning #priority #retail #simulation #synthetic-data
Past study [5] has shown that retailers use conventional techniques with available data to model consumer purchase. While these help in estimating purchase pattern for loyal consumers and high selling items with reasonable accuracy, they do not perform well for the long tail.
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
Past study [5] has shown that retailers use conventional techniques with available data to model consumer purchase. While these help in estimating purchase pattern for loyal consumers and high selling items with reasonable accuracy, they do not perform well for the long tail. Since multiple parameters interact non-linearly to define consumer purchase pattern, traditional models are not sufficient to achieve high accuracy across thousands to millions of consu

Original toplevel document (pdf)

cannot see any pdfs




Flashcard 7789526781196

Tags
#tensorflow #tensorflow-certificate
Question

Three types of classification problems:

  • binary classification
  • multiclass
  • [...]
Answer
multilabel

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Three types of classification problems: binary classification multiclass multilabel

Original toplevel document

TfC_02_classification-PART_1
Types of classification problems Three types of classification problems: binary classification multiclass multilabel Multilabel classification - a sample can be assigned to more than one label from more than 2 label options Multiclass classification - a sample can be assigned to one label but from mor







Flashcard 7789528616204

Tags
#deep-learning #keras #lstm #python #sequence
Question
When a network is fit on unscaled data that has a range of values (e.g. quantities in the 10s to 100s) it is possible for large inputs to [...] the learning and convergence of your network, and in some cases prevent the network from effectively learning your problem.
Answer
slow down

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
When a network is fit on unscaled data that has a range of values (e.g. quantities in the 10s to 100s) it is possible for large inputs to slow down the learning and convergence of your network, and in some cases prevent the network from effectively learning your problem.

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7789530451212

Tags
#deep-learning #keras #lstm #python #sequence
Question
Truncated Backpropagation Through Time, or [...](acornym?)
Answer
TBPTT

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Truncated Backpropagation Through Time, or TBPTT, is a modified version of the BPTT training algorithm

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7789532810508

Tags
#deep-learning #keras #lstm #python #sequence
Question
Truncated Backpropagation Through Time, or TBPTT, is a modified version of the [...] training algorithm
Answer
BPTT

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Truncated Backpropagation Through Time, or TBPTT, is a modified version of the BPTT training algorithm

Original toplevel document (pdf)

cannot see any pdfs







Flashcard 7789534645516

Tags
#tensorflow #tensorflow-certificate
Question

Bag of tricks to improve model

[...] model - other loss, other optimizer, change optimizer parameters

Answer
Compile

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
Bag of tricks to improve model Compile model - other loss, other optimizer, change optimizer parameters

Original toplevel document

TfC_02_classification-PART_1
nse(10, activation='relu'), tf.keras.layers.Dense(1, activation='sigmoid') ]) model.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.binary_crossentropy, metrics=['accuracy']) <span>Bag of tricks to improve model Create model - more layers, more neurons, different activation Compile mode - other loss, other optimizer, change optimizer parameters Fit the model - more epochs, more data examples # plots model predictions agains true data import numpy as np def plot_decision_boundry(model, X, y): """ Take in a trained model, features and labels and create numpy.meshgrid of the d







Flashcard 7789536480524

Tags
#tensorflow #tensorflow-certificate
Question
In case of labels as [...] use SparseCategoricalCrossentropy
Answer
integeres

statusnot learnedmeasured difficulty37% [default]last interval [days]               
repetition number in this series0memorised on               scheduled repetition               
scheduled repetition interval               last repetition or drill

Parent (intermediate) annotation

Open it
In case of labels as integeres use SparseCategoricalCrossentropy

Original toplevel document

TfC_02_classification-PART_2
y-axis -> true label x-axis -> predicted label # Create confusion metrics from sklearn.metrics import confusion_matrix y_preds = model_8.predict(X_test) confusion_matrix(y_test, y_preds) <span>important: This time there is a problem with loss function. In case of categorical_crossentropy the labels have to be one-hot encoded In case of labels as integeres use SparseCategoricalCrossentropy # Get the patterns of a layer in our network weights, biases = model_35.layers[1].get_weights() <span>







#RNN #ariadne #behaviour #consumer #deep-learning #patterns #priority #recurrent-neural-networks #retail #simulation #synthetic-data
recency (R), frequency (F), and monetary value (M) variables, called RFM [3], [4], [5]. These variables present some understanding of customer’s behaviour and try to answer the following questions: “How recently did the customer purchase?”, “How often do they purchase?”, and “How much do they spend?” [2].
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
The CLV models use different strategies for customer behaviour modelling. One of the most reliable ones is using the recency (R), frequency (F), and monetary value (M) variables, called RFM [3], [4], [5]. These variables present some understanding of customer’s behaviour and try to answer the following questions: “How recently did the customer purchase?”, “How often do they purchase?”, and “How much do they spend?” [2]. RFM variables are sufficient statistics for customer behaviour modelling and are a mainstay of the industry because of their ease of implementation in practice [6], [3].

Original toplevel document (pdf)

cannot see any pdfs




#data-science #infrastructure
if we assume that human-time is more expensive than computer-time, which is certainly true for most data scientists, it makes sense to use a highly expressive, productivity-boosting language like Python instead of a low-level language like C++, even if it makes workloads more inefficient to process
statusnot read reprioritisations
last reprioritisation on suggested re-reading day
started reading on finished reading on


Parent (intermediate) annotation

Open it
ctive. This realization has fundamental ramifications to how we should think about and design infrastructure for data scientists— for fellow human beings, instead of for machines. For instance, <span>if we assume that human-time is more expensive than computer-time, which is certainly true for most data scientists, it makes sense to use a highly expressive, productivity-boosting language like Python instead of a low-level language like C++, even if it makes workloads more inefficient to process. <span>

Original toplevel document (pdf)

cannot see any pdfs