Difference-in-differences design

class: center, middle, inverse, title-slide

.title[
# Difference-in-differences design
]
.subtitle[
## Tutorial 5
]
.date[
### Stanislav Avdeev
]

---

# Goal for today's tutorial

1. Discuss and estimate a basic framework of DiD as two-way fixed effects estimator
1. Discuss and estimate an event-study specification
1. Discuss how to make proper inferences about DiD estimates
1. Discuss parallel trend assumption, pre-trend tests, and the connection between the two

---

# DiD as two-way fixed effects

- The basic idea is to take fixed effects and then compare the **within variation** across groups
  - a **treated** group: individuals who get treated
  - a **control** group: individuals who do not get treated
  - we need to observe them both **before** and **after** they (don't) get their treatment
- Eventually, we want to estimate **within variation** for groups
  - control for time effects
  - control for group effects
  - compare within variation across groups
  - sounds like a job for the fixed effects estimator
- The question DiD tries to answer is "what was the effect of some policy on the people who were affected by it?"
  - so, DiD estimates ATET under constant treatment effect assumption with two periods only

---

# DiD as two-way fixed effects

- We can estimate a standard DiD using the following formula
`$$Y_{gt} = \alpha_t + \eta_g + \delta D_{gt} + U_{gt}$$`
where 
  - `$\alpha_t$` is the common time trend
  - `$\eta_g$` is the group specific effect
  - `$D_{gt}$` is is an interaction term equal to `$1$` if you are in the treated group in the post-treatment period
  - `$\delta$` is the DiD estimate
---

# DiD as two-way fixed effects

- If you have only two groups and two time periods you can present this regression as follows
`$$Y_{gt} = \beta_0 + \beta_1Post_t + \beta_2Treated_g + \beta_3Post_t \times Treated_g + U_{gt}$$`
where 
  - `$Post_t$` is a binary variable equal to `$1$` if you are in the post-treatment period
  - `$Treated_g$` is a binary variable equal to `$1$` if you are in the treated group
  - `$Treated_g \times Post_t$` is an interaction term equal to `$1$` if you are in the treated group in the post-treatment period

---

# DiD as two-way fixed effects

- How can we interpret the estimated coefficients?

`$$Y_{gt} = \beta_0 + \beta_1Post_t + \beta_2Treated_g + \beta_3Post_t \times Treated_g + U_{gt}$$`
- `$\beta_0$` is the prediction when `$Post_t = 0$` and `$Treated_g = 0$`
  - `$\beta_0$` is the **mean of the control group before**
- `$\beta_1$` is the prediction when `$Post_t = 1$` and `$Treated_g = 0$`, i.e. difference between periods before and after for the control group
  - `$\beta_0 + \beta_1$` is the **mean of the control group after**
- `$\beta_2$` is the prediction when `$Post_t = 0$` and `$Treated_g = 1$`, i.e. difference between treated and control groups
  - `$\beta_0 + \beta_2$` is the **mean of the treated group before**
- `$\beta_3$` is the prediction when `$Post_t = 1$` and `$Treated_g = 1$`, i.e. is how much bigger the before-after difference for the control and treated groups - DiD
  - `$\beta_0 + \beta_1 + \beta_2 + \beta_3$` is the **mean of the treated group after**

---

# 2 `$\times$` 2 DiD: simulation

- Let us simulate a dataset with `$2$` groups and `$2$` time periods

```r
set.seed(7)
df <- tibble(year = rep(1:2, 5000),
             group = sort(rep(0:1, 5000)), 
             post = ifelse(year == 2, 1, 0),
             treated = ifelse(group == 1, 1, 0),
             D = post*treated,
             Y = 3*D + year + group + rnorm(10000))
```

| year| group| post| treated|  D|        Y|
|----:|-----:|----:|-------:|--:|--------:|
|    1|     0|    0|       0|  0| 2.208622|
|    2|     0|    1|       0|  0| 1.611525|
|    1|     1|    0|       1|  0| 1.560944|
|    2|     1|    1|       1|  1| 5.562782|

---

# 2 `$\times$` 2 DiD: simulation

- First, let us manually calculate DiD

```r
# The true effect is 3
means <- df %>% group_by(treated, post) %>% summarize(Y = mean(Y))
means
```

```
## # A tibble: 4 × 3
## # Groups:   treated [2]
##   treated  post     Y
##     <dbl> <dbl> <dbl>
## 1       0     0  1.01
## 2       0     1  2.00
## 3       1     0  2.00
## 4       1     1  6.00
```

```r
treated_dif <- means[means$treated == 1 & means$post == 1,]$Y - 
               means[means$treated == 1 & means$post == 0,]$Y
control_dif <- means[means$treated == 0 & means$post == 1,]$Y - 
               means[means$treated == 0 & means$post == 0,]$Y
did <- treated_dif - control_dif
```

```
## [1] 4.0096336 0.9845777 3.0250559
```

---

# 2 `$\times$` 2 DiD: simulation

- Now let us use `feols()` in `fixest` package as DiD is the FE estimator

```r
# The true effect is 3
library(fixest)
m1 <- feols(Y ~ D | year + group, df,
            se = 'standard') # no need to cluster s.e. 
                             # as there are only 2 groups and 2 time periods
# We can estimate DiD using simple lm()
m2 <- lm(Y ~ D + factor(year) + factor(group), df)
```

---

# DiD: graphically

![](data:image/png;base64,#tutorial5_files/figure-html/unnamed-chunk-7-1.gif)

---

# DiD: more groups and time periods

- Let us simulate a dataset with `$20$` groups and `$10$` time periods 
  - with first treated period being period `$7$` 
  - and the treated groups being `$15$` and `$20$`

```r
set.seed(7)
df <- tibble(year = rep(1:10, 1000),
             group = sort(rep(1:20, 500)), 
             post = ifelse(year >= 7, 1, 0),
             treated = ifelse(group >= 15, 1, 0),
             D = post*treated,
             Y = 3*D + year + group + rnorm(10000))
```

---

# DiD: simulation

```r
# The true effect is 3
library(fixest)
m <- feols(Y ~ D | year + group, df)
```

---

# DiD: inference

- Always remember what the level of your treatment is
  - is your treatment assigned at the level of state?
  - is your treatment assigned at the level of university?
  - is your treatment assigned at the level of class?
- If you have not done so, read a paper by Abadie et al. 2017 
- It’s common to cluster s.e. at the level of the fixed effects, since it seems likely that errors would be correlated over time
  - not accounting for clustering leads to incorrect s.e.
  - `feols()` clusters by the first FE by default

---

# DiD: inference

```r
# The true effect is 3
m1 <- feols(Y ~ D | year + group, df, se = 'standard')
m2 <- feols(Y ~ D | year + group, df, cluster = "year")
m3 <- feols(Y ~ D | year + group, df, cluster = "group")
m4 <- feols(Y ~ D | year + group, df, cluster = "year^group")
```

<table style="NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto;" class="table">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Model 1 </th>
   <th style="text-align:center;"> Model 2 </th>
   <th style="text-align:center;"> Model 3 </th>
   <th style="text-align:center;"> Model 4 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> D </td>
   <td style="text-align:center;"> 2.987*** </td>
   <td style="text-align:center;"> 2.987*** </td>
   <td style="text-align:center;"> 2.987*** </td>
   <td style="text-align:center;"> 2.987*** </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1px">  </td>
   <td style="text-align:center;box-shadow: 0px 1px"> (0.045) </td>
   <td style="text-align:center;box-shadow: 0px 1px"> (0.033) </td>
   <td style="text-align:center;box-shadow: 0px 1px"> (0.040) </td>
   <td style="text-align:center;box-shadow: 0px 1px"> (0.046) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Num.Obs. </td>
   <td style="text-align:center;"> 10000 </td>
   <td style="text-align:center;"> 10000 </td>
   <td style="text-align:center;"> 10000 </td>
   <td style="text-align:center;"> 10000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Std.Errors </td>
   <td style="text-align:center;"> IID </td>
   <td style="text-align:center;"> by: year </td>
   <td style="text-align:center;"> by: group </td>
   <td style="text-align:center;"> by: year^group </td>
  </tr>
</tbody>
<tfoot><tr><td style="padding: 0; " colspan="100%">
<sup></sup> + p &lt; 0.1, * p &lt; 0.05, ** p &lt; 0.01, *** p &lt; 0.001</td></tr></tfoot>
</table>

- Remember that how you calculate your s.e. **does not** affect point estimates

---

# DiD: event-study specification

- We've limited ourselves to "before" and "after" but this is not all we have
- But that averages out the treatment across the entire "after" period
  - what if an effect takes time to get going? or fades out?
- We can also estimate a **dynamic effect** where we allow the effect to be different at different time periods since the treatment
- To implement an event-study specification
  - interact a binary indicator for being in the treated group with binary indicators for time period 
  - impose the normalisation `$\delta_{-1} = 0$`, which is the coefficient for the last period before the treatment, otherwise you get perfect multicollinearity
`$$Y_{gt} = \alpha_t + \eta_g + \delta_{t - \tau_g}D_{gt} + U_{gt}$$`
where 
  - `$\tau_g$` is the moment of the treatment
  - `$t$` is time period
  - `feols()` makes this easy with its `i()` interaction function
- Then, just plot these estimates

---

# DiD: event-study specification

- Let us make a more concrete example
- Suppose we have `$6$` time periods: `$3$` periods before and `$3$` periods after
  - so `$t = 6$` and `$\tau_g = 4$`
- Then our model is as follows
`\begin{align*}
  Y_{gt} &= \alpha_t + \eta_g + \delta_{t - \tau_g}D_{gt} + U_{gt} \\
  &= \alpha_t + \eta_g + \delta_{1 - 4}D_{g1} + \delta_{2 - 4}D_{g2} + \delta_{3 - 4}D_{g3} \\
  &+ \delta_{4 - 4}D_{g4} + \delta_{5 - 4}D_{g5} + \delta_{6 - 4}D_{g6} + U_{gt} \\
  &= \alpha_t + \eta_g + \delta_{-3}D_{g1} + \delta_{-2}D_{g2} + \delta_{-1}D_{g3} + \delta_{0}D_{g4} + \delta_{1}D_{g5} + \delta_{2}D_{g6} + U_{gt}
\end{align*}`
where 
  - `$\delta_{-3}, \delta_{-2}, \delta_{-1}$` are coefficients of the "effect" before the treatment period
  - `$\delta_{-1} = 0$` which is the coefficient for the last period before the treatment 
  - `$\delta_{0}, \delta_{1}, \delta_{2}$` are coefficients of the effect after the treatment period

---

# Parallel trends

- For the DiD to work we have to pick the control group 
  - we need a control group for which parallel trends holds
  - if there had been no treatment, both treated and control groups would have had the same time effect
- We can't check this directly, since it's counterfactual
  - we can only check whether it is plausible
- DiD gives us a causal effect as long as **the only reason the gap changed** was the treatment
  - this is called **parallel trends assumption**
- The parallel trends assumption means that if the treatment had not happened, the gap between the two groups would have stayed the same
- There are two main ways we can use test the plausibility of parallel trends
  - First, we can check for differences in **prior trends**
  - Second, we can do a **placebo test**

---

# Parallel trends: prior trends

- You can check whether the assumption is plausible by seeing if **prior trends** are the same for treated and control groups
  - if we have multiple pre-treatment periods, was the gap changing a lot during that period?
- If the two groups were already trending towards each other, or away from each other, before treatment, it is hard to believe that parallel trends hold
- They **probably** would have continued trending together/apart, breaking parallel trends
  - in this case we would mix up the continuation of the trend with the effect of treatment
- Sometimes you can adjust for prior trends to fix parallel trends violations 
  - by including a time variable directly
  - or by using a synthetic control method
- Just because **prior trends** are equal does not mean that **parallel trends** holds
  - **parallel trends** is about what the before-after change **would have been** and we can't see that
  - but it can be **suggestive**

---

# Parallel trends: prior trends

- Recall the formula we used in an event-study framework
`\begin{align*}
  Y_{gt} &= \alpha_t + \eta_g + \delta_{t - \tau_g}D_{gt} + U_{gt} \\
  &= \alpha_t + \eta_g + \delta_{-3}D_{g1} + \delta_{-2}D_{g2} + \delta_{-1}D_{g3} + \delta_{0}D_{g4} + \delta_{1}D_{g5} + \delta_{2}D_{g6} + U_{gt}
\end{align*}`
- To check parallel pre-trends, test if `$\delta_{-3}, \delta_{-2}$` are jointly significant
  - you can do so with `wald()` in `fixest` package
- If they are jointly insignificant, there is no evidence of differences in prior trends
  - that doesn't **prove** parallel trends but failing this test would make prior trends **less plausible**
- You can also check more complex time trends by including polynomial terms or other nonlinearities

---

# Parallel trends: placebo

- Many causal inference designs can be tested using **placebo tests**
- To implement a placebo test
  - use only the data that came before the treatment went into effect
  - pick a fake treatment period
  - estimate the same DiD model you used
  - if you find an "effect", that is evidence that there is something wrong with your design, which may imply a violation of parallel trends

---

# Parallel trends: placebo

```r
# Remember the first treated period was period 7
df_fake <- df %>%
  filter(year < 7) %>%
  # pick a fake treatment period
  mutate(post1 = ifelse(year >= 4, 1, 0),
         post2 = ifelse(year >= 5, 1, 0),
         D1 = post1*treated, 
         D2 = post2*treated)
```

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> year </th>
   <th style="text-align:right;"> group </th>
   <th style="text-align:right;"> post </th>
   <th style="text-align:right;"> treated </th>
   <th style="text-align:right;"> D </th>
   <th style="text-align:right;"> Y </th>
   <th style="text-align:right;"> post1 </th>
   <th style="text-align:right;"> post2 </th>
   <th style="text-align:right;"> D1 </th>
   <th style="text-align:right;"> D2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 24.57066 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 23.15619 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 23.71590 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:right;"> 20 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 25.20272 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
</tbody>
</table>

---

# Parallel trends: placebo

```r
# The true effect is 3
library(fixest)
m1 <- feols(Y ~ D1 | year + group, df_fake)
m2 <- feols(Y ~ D2 | year + group, df_fake)
```

<table style="NAborder-bottom: 0; width: auto !important; margin-left: auto; margin-right: auto;" class="table">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:center;"> Model 1 </th>
   <th style="text-align:center;"> Model 2 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> D1 </td>
   <td style="text-align:center;"> −0.022 </td>
   <td style="text-align:center;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;">  </td>
   <td style="text-align:center;"> (0.062) </td>
   <td style="text-align:center;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> D2 </td>
   <td style="text-align:center;">  </td>
   <td style="text-align:center;"> −0.081 </td>
  </tr>
  <tr>
   <td style="text-align:left;box-shadow: 0px 1px">  </td>
   <td style="text-align:center;box-shadow: 0px 1px">  </td>
   <td style="text-align:center;box-shadow: 0px 1px"> (0.050) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Num.Obs. </td>
   <td style="text-align:center;"> 6000 </td>
   <td style="text-align:center;"> 6000 </td>
  </tr>
</tbody>
<tfoot><tr><td style="padding: 0; " colspan="100%">
<sup></sup> + p &lt; 0.1, * p &lt; 0.05, ** p &lt; 0.01, *** p &lt; 0.001</td></tr></tfoot>
</table>

- There is no "effect" of our fake treatment which is a good sign

---

# Parallel trends: some remarks

- Sometimes you will find significant effects while testing parallel pre-trends or by using a placebo
- However, for both prior trends and placebo tests, we are a little less concerned with **significance** than with **meaningful size** of the violations
  - after all, with enough sample size **anything** is significant
  - and if fake treatment effects are fairly tiny, you can argue these effects away

---

# DiD as two-way fixed effects: problems

- One common variant of difference-in-difference is the **rollout design**, in which there are multiple treated groups, each being treated at a different time
  - rollout designs are possibly the most common form of DiD you see
- As discovered recently, two-way fixed effects **does not** work to estimate DiD when you have a rollout design
  - think about what fixed effects does - it leaves you only with within variation
  - two types of individuals without **any** within variation between periods A and B: the never-treated and the already-treated
  - so the already-treated can end up getting used as controls in a rollout
- This becomes a big problem especially if the effect grows/shrinks over time

---

# DiD as two-way fixed effects: solutions

- There are a few new estimators that deal with rollout designs properly
  - Goodman-Bacon (2021)
  - Callaway and Sant'Anna (2021)
- They take each period of treatment and consider the group treated **on that particular period**
- They explicitly only use untreated groups as controls
- And they also use **matching** to improve the selection of control groups for each period's treated group
- We will not go into these methods, but it is good to know for your future research

---

# References

Books
- Huntington-Klein, N. The Effect: An Introduction to Research Design and Causality, [Chapter 18: Difference-in-Differences](https://theeffectbook.net/ch-DifferenceinDifference.html)
- Cunningham, S. Causal Inference: The Mixtape, [Chapter 9: Difference-in-Differences](https://mixtape.scunning.com/difference-in-differences.html)
- Cunningham, S. Causal Inference: The Mixtape, [Chapter 10: Synthetic Control](https://mixtape.scunning.com/synthetic-control.html)

Slides
- Huntington-Klein, N. Econometrics Course, [Week 07: Difference-in-Difference](https://github.com/NickCH-K/EconometricsSlides/blob/master/Week_07/Week_07_1_Difference_in_Difference.html)
- Huntington-Klein, N. Causality Inference Course, [Lecture 09: Difference-in-Differences](https://github.com/NickCH-K/CausalitySlides/blob/main/Lecture_09_Difference_in_Differences.html)
- Huntington-Klein, N. Causality Inference Course, [Lecture 10: Difference-in-Differences](https://github.com/NickCH-K/CausalitySlides/blob/main/Lecture_10_Difference_in_Differences_Estimation.html)

Articles
- Abadie, A., Athey, S., Imbens, G. W., & Wooldridge, J. (2017). [When Should You Adjust Standard Errors for Clustering?](https://www.nber.org/papers/w24003) (No. w24003). National Bureau of Economic Research
- Goodman-Bacon, A. (2021). [Difference-in-differences with variation in treatment timing](https://www.sciencedirect.com/science/article/pii/S0304407621001445)
- Callaway, B., & Sant’Anna, P. H. (2021). [Difference-in-differences with multiple time periods](https://www.sciencedirect.com/science/article/pii/S0304407620303948)