# on 11-Jul-2022 (Mon)

#### Flashcard 7103446650124

Tags
#DAG #causal #edx
Question
the most important take-home message: we need expert knowledge to determine if we should adjust for a variable. The [...] criteria are insufficient to characterize confounding and confounders.
statistical

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
the most important take-home message: we need expert knowledge to determine if we should adjust for a variable. The statistical criteria are insufficient to characterize confounding and confounders.

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7103448485132

 #DAG #causal #edx Of course, in many cases we don't have enough expert knowledge to draw the true causal DAG that represents a causal structure of treatment A, outcome Y, and potential confounder L. In those cases we may propose several possible causal DAGs without being able choose a particular one. And that's fine, because those causal DAGs that we propose allow us to identify inconsistencies between our beliefs and our actions.

#### Parent (intermediate) annotation

Open it
ost important take-home message: we need expert knowledge to determine if we should adjust for a variable. The statistical criteria are insufficient to characterize confounding and confounders. <span>Of course, in many cases we don't have enough expert knowledge to draw the true causal DAG that represents a causal structure of treatment A, outcome Y, and potential confounder L. In those cases we may propose several possible causal DAGs without being able choose a particular one. And that's fine, because those causal DAGs that we propose allow us to identify inconsistencies between our beliefs and our actions. For example, suppose L is fetal death. We don't know the true causal DAG, we propose seven causal DAGs. Suppose that L does not help block a backdoor path in any of the seven DAGs, then

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7103451893004

 [unknown IMAGE 7096340712716] #abm #agent-based #has-images #machine-learning #model #priority During the Experience phase agents make random decisions and add new entries to the database consisting of a vector with all their sensory input, the randomly chosen action and the result, i.e. if the score increased, decreased or stayed the same due to this decision.

#### Parent (intermediate) annotation

Open it
Sugarscape model: During the Experience phase agents make random decisions and add new entries to the database consisting of a vector with all their sensory input, the randomly chosen action and the result, i.e. if the score increased, decreased or stayed the same due to this decision. To gather a sufficient amount of data, the Experience phase lasted for 5000 time steps, which is computationally relatively cheap and can be calculated in a few seconds using a single c

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7103453728012

Tags
#abm #agent-based #machine-learning #model #priority
Question

The goal of the presented framework is to provide a universal technique for agent-based models, in which the decision making process of the agents is not determined by theory-driven or empirically found rules, but rather by an Artificial Neural Network. The process itself can be separated into four phases:

(1) [...]

(2) Experience

(3) Training

(4) Application

Initialization

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
making process of the agents is not determined by theory-driven or empirically found rules, but rather by an Artificial Neural Network. The process itself can be separated into four phases: (1) <span>Initialization (2) Experience (3) Training (4) Application <span>

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7103455563020

 #abm #agent-based #machine-learning #model #priority Expanding the framework In the previous section, we found an example in which the presented framework fails, because the states encountered during Experience phase and Application phase differ too much. We will now adapt the framework, in order to increase its scope to such systems.

#### Parent (intermediate) annotation

Open it
3.3. Expanding the framework In the previous section, we found an example in which the presented framework fails, because the states encountered during Experience phase and Application phase differ too much. We will now adapt the framework, in order to increase its scope to such systems. Instead of a sequential approach of training followed by application, we switch to an iterative approach: (1) Initial Experience (Random decisions) (2) Training of Neural Network (NN) (

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7103458970892

Tags
#has-images
[unknown IMAGE 7103461330188]
[unknown IMAGE 7103459495180]
Zrobione: batch_size,

status measured difficulty not learned 37% [default] 0

#### Flashcard 7103464738060

Tags
#has-images
[unknown IMAGE 7103467097356]
[unknown IMAGE 7103465262348]

status measured difficulty not learned 37% [default] 0

#### Annotation 7103471029516

 #advanced #deep-learning #keras #python # Import the sigmoid function from scipy from scipy.special import expit as sigmoid # Weight from the model weight = 0.14 # Print the approximate win probability predicted close game print(sigmoid(1 * 0.14)) # Print the approximate win probability predicted blowout game print(sigmoid(10 * 0.14))

#### Flashcard 7103472864524

Tags
Question

# Import the sigmoid function from scipy

from scipy.special import [...] as sigmoid

# Weight from the model

weight = 0.14

# Print the approximate win probability predicted close game

print(sigmoid(1 * 0.14))

# Print the approximate win probability predicted blowout game

print(sigmoid(10 * 0.14))

expit

status measured difficulty not learned 37% [default] 0

Open it
# Import the sigmoid function from scipy from scipy.special import expit as sigmoid # Weight from the model weight = 0.14 # Print the approximate win probability predicted close game print(sigmoid(1 * 0.14)) # Print the approximate win probability predicted

#### Annotation 7103796088076

 #deep-learning #keras #lstm #python #sequence the 4 different types of sequence prediction problems: 1. Sequence Prediction. 2. Sequence Classification. 3. Sequence Generation. 4. Sequence-to-Sequence Prediction

#### pdf

cannot see any pdfs

#### Annotation 7103797660940

 #deep-learning #keras #lstm #python #sequence Sequence prediction may also generally be referred to as sequence learning. Technically, we could refer to all of the following problems as a type of sequence prediction problem. This can make things confusing for beginner

#### pdf

cannot see any pdfs

#### Annotation 7103799233804

 #deep-learning #keras #lstm #python #sequence Some examples of sequence prediction problems include: Weather Forecasting . Given a sequence of observations about the weather over time, predict the expected weather tomorrow. Stock Market Prediction . Given a sequence of movements of a security over time, predict the next movement of the security. Product Recommendation . Given a sequence of past purchases for a customer, predict the next purchase for a customer

#### pdf

cannot see any pdfs

#### Annotation 7103801330956

 #deep-learning #keras #lstm #python #sequence Sequence classification involves predicting a class label for a given input sequence.

#### pdf

cannot see any pdfs

#### Annotation 7103802903820

 #deep-learning #keras #lstm #python #sequence Some examples of sequence classification problems include: DNA Sequence Classification . Given a DNA sequence of A, C, G, and T values, predict whether the sequence is for a coding or non-coding region. Anomaly Detection . Given a sequence of observations, predict whether the sequence is anomalous or not. Sentiment Analysis . Given a sequence of text such as a review or a tweet, predict whether the sentiment of the text is positive or negative

#### pdf

cannot see any pdfs

#### Annotation 7103804476684

 #deep-learning #keras #lstm #python #sequence Sequence generation involves generating a new output sequence that has the same general characteristics as other sequences in the corpus. For example: Input Sequence: [1, 3, 5], [7, 9, 11] Output Sequence: [3, 5 ,7]

#### pdf

cannot see any pdfs

#### Annotation 7103806049548

 #deep-learning #keras #lstm #python #sequence Sequence-to-sequence prediction involves predicting an output sequence given an input sequence. For example: Input Sequence: 1, 2, 3, 4, 5 Output Sequence: 6, 7, 8, 9, 1

#### pdf

cannot see any pdfs

#### Annotation 7103807622412

 #deep-learning #keras #lstm #python #sequence Limitations of using MLP for sequence predicting This can work well on some problems, but it has 5 critical limitations. Stateless . MLPs learn a fixed function approximation. Any outputs that are conditional on the context of the input sequence must be generalized and frozen into the network weights. Unaware of Temporal Structure. Time steps are modelled as input features, meaning that the network has no explicit handling or understanding of the temporal structure or order between observations. Messy Scaling . For problems that require modeling multiple parallel input sequences, the number of input features increases as a factor of the size of the sliding window without any explicit separation of time steps of series. Fixed Sized Inputs . The size of the sliding window is fixed and must be imposed on all inputs to the network. Fixed Sized Outputs . The size of the output is also fixed and any outputs that do not conform must be forced. MLPs do offer great capability for sequence prediction but still suffer from this key limitation of having to specify the scope of temporal dependence between observations explicitly upfront in the design of the model. Sequences pose a challenge for [deep neural networks] because they require that the dimensionality of the inputs and outputs is known and fixed. — Sequence to Sequence Learning with Neural Networks, 2014 MLPs are a good starting point for modeling sequence prediction problems, but we now have better options.

#### pdf

cannot see any pdfs

#### Annotation 7103809981708

 #deep-learning #keras #lstm #python #sequence The Long Short-Term Memory, or LSTM, network is a type of Recurrent Neural Network.

#### pdf

cannot see any pdfs

#### Annotation 7103811554572

 #deep-learning #keras #lstm #python #sequence Given a standard feedforward MLP network, an RNN can be thought of as the addition of loops to the architecture. For example, in a given layer, each neuron may pass its signal laterally (sideways) in addition to forward to the next layer. The output of the network may feedback as an input to the network with the next input vector. And so on.

#### pdf

cannot see any pdfs

#### Annotation 7103813127436

 #deep-learning #keras #lstm #python #sequence .. recurrent neural networks contain cycles that feed the network activations from a previous time step as inputs to the network to influence predictions at the current time step. These activations are stored in the internal states of the network which can in principle hold long-term temporal contextual information. This mechanism allows RNNs to exploit a dynamically changing contextual window over the input sequence history — Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling, 2014

#### pdf

cannot see any pdfs

#### Annotation 7103814700300

 #deep-learning #keras #lstm #python #sequence The promise of recurrent neural networks is that the temporal dependence and contextual information in the input data can be learned. A recurrent network whose inputs are not fixed but rather constitute an input sequence can be used to transform an input sequence into an output sequence while taking into account contextual information in a flexible way

#### pdf

cannot see any pdfs

#### Annotation 7103816273164

 #deep-learning #keras #lstm #python #sequence The LSTM network is different to a classical MLP. Like an MLP, the network is comprised of layers of neurons. Input data is propagated through the network in order to make a prediction. Like RNNs, the LSTMs have recurrent connections so that the state from previous activations of the neuron from the previous time step is used as context for formulating an output. But unlike other RNNs, the LSTM has a unique formulation that allows it to avoid the problems that prevent the training and scaling of other RNNs. This, and the impressive results that can be achieved, are the reason for the popularity of the technique

#### pdf

cannot see any pdfs

#### Annotation 7103817846028

 #deep-learning #keras #lstm #python #sequence Unfortunately, the range of contextual information that standard RNNs can access is in practice quite limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network’s recurrent connections. This shortcoming ... referred to in the literature as the vanishing gradient problem ... Long Short-Term Memory (LSTM) is an RNN architecture specifically designed to address the vanishing gradient problem. — A Novel Connectionist System for Unconstrained Handwriting Recognition, 2009

#### pdf

cannot see any pdfs

#### Annotation 7103819418892

 #deep-learning #keras #lstm #python #sequence The computational unit of the LSTM network is called the memory cell, memory block, or just cell for short.

#### pdf

cannot see any pdfs

#### Annotation 7103820991756

 #deep-learning #keras #lstm #python #sequence LSTM cells are comprised of weights and gates

#### pdf

cannot see any pdfs

#### Annotation 7103822564620

 #deep-learning #keras #lstm #python #sequence 1.4.1 LSTM Weights A memory cell has weight parameters for the input, output, as well as an internal state that is built up through exposure to input time steps. Input Weights. Used to weight input for the current time step. Output Weights. Used to weight the output from the last time step. Internal State. Internal state used in the calculation of the output for this time step

#### pdf

cannot see any pdfs

#### Annotation 7103824137484

 #deep-learning #keras #lstm #python #sequence 1.4.2 LSTM Gates The key to the memory cell are the gates. These too are weighted functions that further govern the information flow in the cell. There are three gates: Forget Gate: Decides what information to discard from the cell. Input Gate: Decides which values from the input to update the memory state. Output Gate: Decides what to output based on input and the memory of the cell. The forget gate and input gate are used in the updating of the internal state. The output gate is a final limiter on what the cell actually outputs. It is these gates and the consistent data flow called the constant error carrousel or CEC that keep each cell stable (neither exploding or vanishing). Each memory cell’s internal architecture guarantees constant error flow within its constant error carrousel CEC... This represents the basis for bridging very long time lags. Two gate units learn to open and close access to error flow within each memory cell’s CEC. The multiplicative input gate affords protection of the CEC from perturbation by irrelevant inputs. Likewise, the multiplicative output gate protects other units from perturbation by currently irrelevant memory contents

#### pdf

cannot see any pdfs

#### Annotation 7103825710348

 #deep-learning #keras #lstm #python #sequence We can summarize the 3 key benefits of LSTMs as: 1. Overcomes the technical problems of training an RNN, namely vanishing and exploding gradients. 2. Possesses memory to overcome the issues of long-term temporal dependency with input sequences 3. Process input sequences and output sequences time step by time step, allowing variable length inputs and outputs

#### pdf

cannot see any pdfs

#### Annotation 7103828856076

 #deep-learning #keras #lstm #python #sequence LSTMs are very impressive. The design of the network overcomes the technical challenges of RNNs to deliver on the promise of sequence prediction with neural networks. The applications of LSTMs achieve impressive results on a range of complex sequence prediction problems. But LSTMs may not be ideal for all sequence prediction problems. For example, in time series forecasting, often the information relevant for making a forecast is within a small window of past observations. Often an MLP with a window or a linear model may be a less complex and more suitable model

#### pdf

cannot see any pdfs

#### Annotation 7103830428940

 #deep-learning #keras #lstm #python #sequence Time series benchmark problems found in the literature ... are often conceptually simpler than many tasks already solved by LSTM. They often do not require RNNs at all, because all relevant information about the next event is conveyed by a few recent events contained within a small time window. — Applying LSTM to Time Series Predictable through Time-Window Approaches, 2001

#### pdf

cannot see any pdfs

#### Annotation 7103832001804

 #deep-learning #keras #lstm #python #sequence A time window based MLP outperformed the LSTM pure-[autoregression] approach on certain time series prediction benchmarks solvable by looking at a few recent inputs only. Thus LSTM’s special strength, namely, to learn to remember single events for very long, unknown time periods, was not necessary

#### pdf

cannot see any pdfs

#### Annotation 7103833574668

 #deep-learning #keras #lstm #python #sequence The caution is that LSTMs are not a silver bullet and to carefully consider the framing of your problem. Think of the internal state of LSTMs as a handy internal variable to capture and provide context for making predictions. If your problem looks like a traditional autoregression type problem with the most relevant lag observations within a small window, then perhaps develop a baseline of performance with an MLP and sliding window before considering an LSTM.

#### pdf

cannot see any pdfs

#### Annotation 7103835409676

 #deep-learning #keras #lstm #python #sequence The goal of this lesson is for you to understand the Backpropagation Through Time algorithm used to train LSTMs. After completing this lesson, you will know: What Backpropagation Through Time is and how it relates to the Backpropagation training algorithm used by Multilayer Perceptron networks. The motivations that lead to the need for Truncated Backpropagation Through Time, the most widely used variant in deep learning for training LSTMs. A notation for thinking about how to configure Truncated Backpropagation Through Time and the canonical configurations used in research and by deep learning libraries.

#### pdf

cannot see any pdfs

#### Annotation 7103836982540

 #deep-learning #keras #lstm #python #sequence The goal of the backpropagation training algorithm is to modify the weights of a neural network in order to minimize the error of the network outputs compared to some expected output in response to corresponding inputs. It is a supervised learning algorithm that allows the network to be corrected with regard to the specific errors made. The general algorithm is as follows: 1. Present a training input pattern and propagate it through the network to get an output. 2. Compare the predicted outputs to the expected outputs and calculate the error. 3. Calculate the derivatives of the error with respect to the network weights. 4. Adjust the weights to minimize the error. 5. Repeat

#### pdf

cannot see any pdfs

#### Annotation 7103849041164

 #deep-learning #embeddings researchers have found the embedding can be also used in other domains like search and recommendations, where we can put latent meanings into the products to train the machine learning tasks through the use of neural networks

#### pdf

cannot see any pdfs

#### Annotation 7103850614028

 #deep-learning #embeddings With the similar idea of how we get word embeddings, we can make an analogy like this: a word is like a product; a sentence is like a sequence of ONE customer’s shopping sequence; an article is like a sequence of ALL customers’ shopping sequence. This embedding technique allows us to represent product or user as low dimensional continuous vectors, while the one-hot encoding method will lead to the curse of dimensionality for the machine learning models