# on 13-Apr-2022 (Wed)

#### Annotation 7070599744780

 #causality #statistics We denote by π(1) the potential outcome of happiness you would observe if you were to get a dog ( π = 1 ). Similarly, we denote by π(0) the potential outcome of happiness you would observe if you were to not get a dog ( π = 0 ). In scenario 1, π(1) = 1 and π(0) = 1. In contrast, in scenario 2, π(1) = 1 and π(0) = 0.

#### pdf

cannot see any pdfs

#### Annotation 7070601317644

 #causality #statistics More generally, the potential outcome π(π‘) denotes what your outcome would be, if you were to take treatment π‘ . A potential outcome π(π‘) is distinct from the observed outcome π in that not all potential outcomes are observed. Rather all potential outcomes can potentially be observed. The one that is actually observed depends on the value that the treatment π takes on

#### pdf

cannot see any pdfs

#### Annotation 7070608133388

 #causality #statistics The Fundamental Problem of Causal Inference: It is impossible to observe all potential outcomes for a given individual

#### pdf

cannot see any pdfs

#### Flashcard 7070610492684

Tags
#causality #statistics
Question

The Fundamental Problem of Causal Inference:

It is [...] to observe all potential outcomes for a given individual

impossible

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The Fundamental Problem of Causal Inference: It is impossible to observe all potential outcomes for a given individual

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7070612065548

Tags
#causality #statistics
Question

The Fundamental Problem of Causal Inference:

It is impossible to observe [...] potential outcomes for a given individual

all

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The Fundamental Problem of Causal Inference: It is impossible to observe all potential outcomes for a given individual

#### Original toplevel document (pdf)

cannot see any pdfs

#### Flashcard 7070618094860

Tags
#causality #has-images #statistics

#causality #statistics

#causality #statistics

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Flashcard 7070629104908

Tags
#DAG #causality #has-images #statistics

#DAG #causality #statistics

#DAG #causality #statistics

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Annotation 7070633037068

 #causality #statistics the fundamental problem of causal inference It is fundamental because if we cannot observe both π π (1) and π π (0) , then we cannot observe the causal effect π π (1) β π π (0) . This problem is unique to causal inference because, in causal inference, we care about making causal claims, which are defined in terms of potential outcomes. For contrast, consider machine learning. In machine learning, we often only care about predicting the observed outcome π , so there is no need for potential outcomes, which means machine learning does not have to deal with this fundamental problem that we must deal with in causal inference.

#### pdf

cannot see any pdfs

#### Annotation 7070634609932

 #causality #statistics The potential outcomes that you do not (and cannot) observe are known as counterfactuals because they are counter to fact (reality). βPotential outcomesβ are sometimes referred to as βcounterfactual outcomes,β but we will never do that in this book because a potential outcome π(π‘) does not become counter to fact until another potential outcome π(π‘ 0 ) is observed. The potential outcome that is observed is sometimes referred to as a factual. Note that there are no counterfactuals or factuals until the outcome is observed. Before that, there are only potential outcomes

#### pdf

cannot see any pdfs

#### Annotation 7070636182796

 #causality #statistics We get the average treatment effect (ATE) by taking an average over the ITEs: π , πΌ[π π (1) β π π (0)] = πΌ[π(1) β π(0)] , where the average is over the individuals π if π π (π‘) is deterministic. If π π (π‘) is random, the average is also over any other randomness

#### pdf

cannot see any pdfs

#### Annotation 7070640114956

 #causality #has-images #statistics We will take this table as the whole population of interest. Because of the fundamental problem of causal inference, this is fundamentally a missing data problem. All of the question marks in the table indicate that we do not observe that cell.

#### pdf

cannot see any pdfs

#### Annotation 7070644309260

 #causality #has-images #statistics Well, what assumption(s) would make it so that the ATE is simply the associational difference? This is equivalent to saying βwhat makes it valid to calculate the ATE by taking the average of the π(0) column, ignoring the question marks, and subtracting that from the average of the π(1) column, ignoring the question marks?β 6 This ignoring of the question marks (missing data) is known as ignorability

#### pdf

cannot see any pdfs

#### Annotation 7070650338572

 #causality #statistics #causality #has-images #statistics

#### pdf

cannot see any pdfs

#### Annotation 7070652173580

 #causality #statistics The ignorability assumption is used in Equation 2.3. We will talk more about Equation 2.4 when we get to Section 2.3.5. Another perspective on this assumption is that of exchangeability.

#### pdf

cannot see any pdfs

#### Flashcard 7070653746444

Tags
#causality #statistics
Question
The ignorability assumption is used in Equation 2.3. We will talk more about Equation 2.4 when we get to Section 2.3.5. Another perspective on this assumption is that of [...another assumption?].
exchangeability

status measured difficulty not learned 37% [default] 0

#### Parent (intermediate) annotation

Open it
The ignorability assumption is used in Equation 2.3. We will talk more about Equation 2.4 when we get to Section 2.3.5. Another perspective on this assumption is that of exchangeability.

#### Original toplevel document (pdf)

cannot see any pdfs

#### Annotation 7070656105740

 #causality #statistics Exchangeability means that the treatment groups are exchangeable in the sense that if they were swapped, the new treatment group would observe the same outcomes as the old treatment group, and the new control group would observe the same outcomes as the old control group.

#### pdf

cannot see any pdfs

#### Annotation 7070657678604

 #causality #statistics To identify a causal effect is to reduce a causal expression to a purely statistical expression. In this chapter, that means to reduce an expression from one that uses potential outcome notation to one that uses only statistical notation such as π , π , π , expectations, and conditioning. This means that we can calculate the causal effect from just the observational distribution π(π, π, π)

#### pdf

cannot see any pdfs

#### Annotation 7070660824332

 #causality #statistics #causality #has-images #statistics Identifibility

#### pdf

cannot see any pdfs

#### Annotation 7070662659340

 #causality #statistics We have seen that ignorability is extremely important (Equation 2.3), but how realistic of an assumption is it? In general, it is completely unrealistic because there is likely to be confounding in most data we observe (causal structure shown in Figure 2.1). However, we can make this assumption realistic by running randomized experiments, which force the treatment to not be caused by anything but a coin toss, so then we have the causal structure shown in Figure 2.2. We cover randomized experiments in greater depth in Chapter 5.

#### pdf

cannot see any pdfs

#### Annotation 7070664232204

 #causality #statistics In observational data, it is unrealistic to assume that the treatment groups are exchangeable. In other words, there is no reason to expect that the groups are the same in all relevant variables other than the treatment.

#### pdf

cannot see any pdfs

#### Annotation 7070665805068

 #causality #statistics there is no reason to expect that the groups are the same in all relevant variables other than the treatment. However, if we control for relevant variables by conditioning, then maybe the subgroups will be exchangeable. We will clarify what the βrelevant variablesβ are in Chapter 3,

#### pdf

cannot see any pdfs

#### Annotation 7070667377932

 #causality #statistics The idea is that although the treatment and potential outcomes may be unconditionally associated (due to confounding), within levels of π , they are not associated. In other words, there is no confounding within levels of π because controlling for π has made the treatment groups comparable.

#### pdf

cannot see any pdfs

#### Flashcard 7070670523660

Tags
#causality #has-images #statistics

#causality #statistics

#causality #statistics

status measured difficulty not learned 37% [default] 0

#### pdf

cannot see any pdfs

#### Annotation 7070674455820

 #causality #has-images #statistics We do not have exchangeability in the data because π is a common cause of π and π . We illustrate this in Figure 2.3. Because π is a common cause of π and π , there is non-causal association between π and π . This non-causal association flows along the π β π β π path; we depict this with a red dashed arc

#### pdf

cannot see any pdfs

#### Annotation 7070678650124

 #causality #has-images #statistics However, we do have conditional exchangeability in the data. This is because, when we condition on π , there is no longer any non-causal association between π and π . The non-causal association is now βblockedβ at π by conditioning on π . We illustrate this blocking in Figure 2.4 by shading π to indicate it is conditioned on and by showing the red dashed arc being blocked there

#### pdf

cannot see any pdfs

#### Annotation 7070682844428

 #causality #statistics Conditional exchangeability is the main assumption necessary for causal inference. Armed with this assumption, we can identify the causal effect within levels of π

#### pdf

cannot see any pdfs

#### Annotation 7070685990156

 #causality #statistics #causality #has-images #statistics

#### pdf

cannot see any pdfs

#### Annotation 7070687825164

 #causality #statistics The main reason for moving from exchangeability (Assumption 2.1) to conditional exchangeability (Assumption 2.2) was that it seemed like a more realistic assumption. However, we often cannot know for certain if conditional exchangeability holds. There may be some unobserved confounders that are not part of π , meaning conditional exchangeability is violated. Fortunately, that is not a problem in randomized experiments

#### pdf

cannot see any pdfs

#### Annotation 7070689398028

 #causality #statistics Positivity is the condition that all subgroups of the data with different covariates have some probability of receiving any value of treatment

#### pdf

cannot see any pdfs

#### Annotation 7070690970892

 #causality #statistics An βestimatorβ is a function that takes a dataset as input and outputs an estimate. We discuss this statistics terminology more in Section 2.4. Thatβs the math for why we need the positivity assumption, but whatβs the intuition? Well, if we have a positivity violation, that means that within some subgroup of the data, everyone always receives treatment or everyone always receives the control. It wouldnβt make sense to be able to estimate a causal effect of treatment vs. control in that subgroup since we see only treatment or only control. We never see the alternative in that subgroup

#### pdf

cannot see any pdfs

#### Annotation 7070692543756

 #causality #statistics The Positivity-Unconfoundedness Tradeoff Although conditioning on more covariates could lead to a higher chance of satisfying unconfoundedness, it can lead to a higher chance of violating positivity. As we increase the dimension of the covariates, we make the subgroups for any level π₯ of the covariates smaller. The Positivity-Unconfoundedness Tradeoff This is related to the curse of dimensionality. As each subgroup gets smaller, there is a higher and higher chance that either the whole subgroup will have treatment or the whole subgroup will have control. For example, once the size of any subgroup has decreased to one, positivity is guaranteed to not hold.