Reading Results Tables

class: center, middle, inverse, title-slide

.title[
# Reading Results Tables
]
.subtitle[
## GHM (2010): Tables
]
.author[
### Hussain Hadah (he/him)
]
.date[
### 06 February 2025
]

---

layout: true
<div style="position: absolute;left:20px;bottom:5px;color:black;font-size: 12px;">Hussain Hadah (he/him) (Tulane) | GHM (2010): Tables | 06 February 2025</div>

.remark-slide table{
        width: auto !important; /* Adjusts table width */
    }

/* Change the background color to white for shaded rows (even rows) */

.remark-slide thead, .remark-slide tr:nth-child(2n) {
        background-color: white;
    }
    .remark-slide thead, .remark-slide tr:nth-child(n) {
        background-color: white;
    }
</style>

---
class: title-slide
background-image: url("assets/TulaneLogo-white.svg"), url("assets/title-image1.jpg")
background-position: 10% 90%, 100% 50%
background-size: 160px, 50% 100%
background-color: #0148A4

# .text-shadow[.white[Outline for Slides]]

<ol>
    <li><h4 class="white">General layout of statistical results tables</h4></li>
    <li><h5 class="white">Coefficients and standard errors – what they mean</h5></li>
    <li><h5 class="white">Constructing confidence intervals: 90%, 95%, and 99%</h5></li>
    <li><h5 class="white">Hypothesis testing: 10%, 5%, and 1% levels</h5></li>
    <li><h5 class="white">What do the &ast;s beside the estimates mean?</h5></li>
</ol>

---
## Quiz <svg viewBox="0 0 448 512" style="height:1em;display:inline-block;position:fixed;top:10;right:10;" xmlns="http://www.w3.org/2000/svg">  <path d="M0 464c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V192H0v272zm64-192c0-8.8 7.2-16 16-16h288c8.8 0 16 7.2 16 16v64c0 8.8-7.2 16-16 16H80c-8.8 0-16-7.2-16-16v-64zM400 64h-48V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H160V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H48C21.5 64 0 85.5 0 112v48h448v-48c0-26.5-21.5-48-48-48z"></path></svg>

### The quiz in on Thursday

--
#### Quiz Topics

- Geographical Definitions & Scavenger Hunt

- Finding data on data.census.gov

- Agglomeration, Clusters, and Cities

- Practice Jigsaw - Economic Clusters

- Greenstone, Hornbeck, & Moretti (2010) + Intro. to Diff-in-Diff

- Reading Results Tables

---
## Q&A on Quiz Content

- **Question:** How could the content from the cluster jigsaw appear on a quiz?

- **Answer:** I will not ask you to provide anything detailed about any specific paper.

---
## Quiz timing, content, and logistics

- The quiz will be 75 minutes long
 - Three short answer questions
 - Possibly one or two multiple choice questions

- Start at 11:00 am and you will need to submit the quiz by 12:15 pm unless you have extra time (via Goldman)

- It will be conducted on Canvas (so you will need a computer).

- It is an open book quiz but you cannot work with anyone or communicate in any way with anyone.

- No need for lockdown, zoom, or any other software

- You do not need to take the quiz in the classroom. That is optional.

- Please ask me any questions! If we are both in the room then come up and talk to me. Otherwise email me at hhadah@tulane.edu. I will monitor my email consistently.

---
## Next Week <svg viewBox="0 0 448 512" style="height:1em;display:inline-block;position:fixed;top:10;right:10;" xmlns="http://www.w3.org/2000/svg">  <path d="M0 464c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V192H0v272zm64-192c0-8.8 7.2-16 16-16h288c8.8 0 16 7.2 16 16v64c0 8.8-7.2 16-16 16H80c-8.8 0-16-7.2-16-16v-64zM400 64h-48V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H160V16c0-8.8-7.2-16-16-16h-32c-8.8 0-16 7.2-16 16v48H48C21.5 64 0 85.5 0 112v48h448v-48c0-26.5-21.5-48-48-48z"></path></svg>

1. Economic Incentives Briefing Note

2. Quiz on Thursday Feb 13th

### Readings <svg viewBox="0 0 576 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg">  <path d="M542.22 32.05c-54.8 3.11-163.72 14.43-230.96 55.59-4.64 2.84-7.27 7.89-7.27 13.17v363.87c0 11.55 12.63 18.85 23.28 13.49 69.18-34.82 169.23-44.32 218.7-46.92 16.89-.89 30.02-14.43 30.02-30.66V62.75c.01-17.71-15.35-31.74-33.77-30.7zM264.73 87.64C197.5 46.48 88.58 35.17 33.78 32.05 15.36 31.01 0 45.04 0 62.75V400.6c0 16.24 13.13 29.78 30.02 30.66 49.49 2.6 149.59 12.11 218.77 46.95 10.62 5.35 23.21-1.94 23.21-13.46V100.63c0-5.29-2.62-10.14-7.27-12.99z"></path></svg>

3. .small[Briefing note readings:]

1. .small[Last Names A to D -> Neumark, Kolko - 2010]

2. .small[Last Names E to G -> Button - 2019]

3. .small[Last Names H to J -> Holmes - 1998]

4. .small[Last Names K to L -> Coates and Humphreys - The Stadium Gambit]

5. .small[Last Names M to O -> Moretti, Wilson - 2014]

6. .small[Last Names P to S -> Strauss-Kahn, Vives - 2009]

7. .small[Last Names T to Z -> Lee - 2008]

---
## General layout of statistical results tables

.pull-left[
  - Top numbers are the estimates.
    - Usually these are coefficient estimates from a regression, but sometimes they are just means or differences in means.
    - These tell you the estimated effect, difference, etc.
    - These tell you the magnitude of the effect or difference – was it small or large? Negative or positive?
]

.pull-right[
  <img src="02-class2_files/figure-html/table1-1.png" width="200%" style="display: block; margin: auto;" />

]

---
## General layout of statistical results tables

.pull-left[
  - Under each estimate, in (), is the standard error (SE) of the estimate.
    - The SE tells us how precise the estimate is. How sure are we of this estimate?
    - Larger SE = less precise estimate, the estimate has a larger margin of error.
    - A confidence interval for this estimate would be wider (as we shall see).
]

.pull-right[
  <img src="02-class2_files/figure-html/table1-1-1.png" width="200%" style="display: block; margin: auto;" />

]

---
## More tables

.pull-left[
  - I will explain more about all these tables later, but for not just notice the typical format: estimate with standard errors underneath it. Also notice the use of &ast;s, which I will explain shortly. These indicate how statistically significant an estimate is.

]

.pull-right[
  <img src="02-class2_files/figure-html/table2-1.png" width="100%" style="display: block; margin: auto;" />
]

---
## Coefficients and standard errors – what they mean

- Estimate (top number) tell us the effect that was estimated and what the magnitude of the effect was.
- The standard error tells us how precise that estimate is (how much margin of error does it have?)
- Here are some examples of coefficients that may make this easier to understand for those of you who haven’t taken econometrics or any statistics courses that use regression.

---
## Coefficients and standard errors – what they mean

- Suppose I estimated the mean (average) productivity of firms in county A and in county B. These are hypothetical numbers.
  - County A = 100, with a standard error of 10. 
  - County B = 90, with a standard error of 12.
  - The difference (A – B) is 10, and suppose it has a standard error of 15.
  - Let’s focus on this estimate of 10, with a standard error of 15.
- The estimate of 10 tells us that county A’s productivity is estimated to be 10 higher than county B’s productivity, on average.
- The standard error is 15, which is fairly high.
- How do we use this standard error to tell us how precise our estimate is?
- The best way to do it is by using it to construct a confidence interval.

---
## Constructing confidence intervals

- There are usually three confidence intervals that (social) scientists create: 90%, 95%, and 99% confidence intervals.
- The intuitive<sup>&ast;</sup> way to understand these is:
  - The 90% (95%, 99%) confidence interval tells us that, under the assumption that our statistical model is correct, the true effect we are measuring lies within our confidence interval 90% (95%, 99%) of the time.
  - For example, suppose the 95% confidence interval of an estimate was (-0.3 to 0.1). Then we are 95% confident that the true effect, the thing we are estimating, lies between -0.3 and 0.1.
- Thus, this confidence intervals tell us how sure we are of our estimates, since it’s impossible to be sure what they are exactly, given randomness and noise in the data.<sup>1</sup>

.pull-right[.footnote[<sup>&ast;</sup> For those with more theoretical stats training, you’ll know that this intuitive explanation isn’t technically correct, but I am not looking to explain to beginners the difference between frequentist and Bayesian statistics or the repeated sampling nature of classical statistics.]
]

---
## Constructing confidence intervals

- How do we make 90%, 95%, and 99% confidence intervals?
- The general formula is:
- Lower bound: Estimate – critical value &ast; standard error
- Upper bound: Estimate + critical value &ast; standard error
- Where the critical value is 1.645 for a 90% interval, 1.96 for 95%, and 2.576 for 99%.
- Going back to our original example, we had an estimate of 10 and a standard error of 15
- Lower bound: 10 – critical value &ast; 15
- Upper bound: 10 + critical value &ast; 15
- Where the critical value is 1.645 for a 90% interval, 1.96 for 95%, and 2.576 for 99%.
- Lower bound: 10 – critical value &ast; 15 = 10 – 1.645&ast;15 = 10 – 24.675 = -14.675 
- Upper bound: 10 + critical value &ast; 15 = 10 + 1.645&ast;15 = 10 + 24.675 = 34.675
- Therefore, the 90% confidence interval is (-14.675, 34.675).

---
## Constructing confidence intervals – 90%

- Lower bound: 10 – critical value &ast; 15 = 10 – 1.645&ast;15 = 10 – 24.675 = -14.675 
- Upper bound: 10 + critical value &ast; 15 = 10 + 1.645&ast;15 = 10 + 24.675 = 34.675
- Therefore, the 90% confidence interval is (-14.675, 34.675).

---
## Constructing confidence intervals – 95%

- Lower bound: 10 – critical value &ast; 15 = 10 – 1.96&ast;15 = 10 – 29.4 = -19.4 
- Upper bound: 10 + critical value &ast; 15 = 10 + 1.96&ast;15 = 10 + 29.4 = 39.4
- Therefore, the 95% confidence interval is (-19.4, 39.4).

---
## Constructing confidence intervals – 95%

.pull-left[
  <img src="02-class2_files/figure-html/hot-tip-1.png" width="100%" style="display: block; margin: auto;" />
]
.pull-right[
  - Lower bound: 10 – critical value &ast; 15 = 10 – 1.96&ast;15 = 10 – 29.4 = -19.4 
  - Upper bound: 10 + critical value &ast; 15 = 10 + 1.96&ast;15 = 10 + 29.4 = 39.4
  - Therefore, the 95% confidence interval is (-19.4, 39.4).
  - **1.96 is very close to 2**, so you can calculate an “eye-ball” confidence interval (not a technical term) by using 2:
  - Lower = 10 – 2&ast;15 = 10 – 30 = -20
  - Upper = 10 + 2&ast;15 = 10 + 30 = 40
]

---
## Constructing confidence intervals – 99%

- Lower bound: 10 – critical value &ast; 15 = 10 – 2.576&ast;15 = 10 – 38.64 = -28.64
- Upper bound: 10 + critical value &ast; 15 = 10 + 2.576&ast;15 = 10 + 38.64 = 48.64
- Therefore, the 99% confidence interval is (-28.64, 48.64).

---
## Comparing confidence intervals

- For our example of an estimate of 10, with a standard error of 15, our confidence intervals are:
- The 90% confidence interval is (-14.675, 34.675).
- The 95% confidence interval is (-19.4, 39.4).
- The 99% confidence interval is (-28.64, 48.64).
- Notice how as we move higher in confidence, the confidence interval grows.
- To be more sure that our interval contains the true value (higher % confidence), we have to increase the interval.

---
class: segue-yellow
background-image: url("assets/TulaneLogo.svg")
background-size: 20%
background-position: 95% 95%

# Confidence Intervals Examples

---
## Example of confidence intervals

---
class: segue-yellow
background-image: url("assets/TulaneLogo.svg")
background-size: 20%
background-position: 95% 95%

# Confidence Intervals Activity

---
## After Class Activity – Calculating confidence intervals  <svg viewBox="0 0 448 512" style="height:1em;display:inline-block;position:fixed;top:10;right:10;" xmlns="http://www.w3.org/2000/svg">  <path d="M144 479H48c-26.5 0-48-21.5-48-48V79c0-26.5 21.5-48 48-48h96c26.5 0 48 21.5 48 48v352c0 26.5-21.5 48-48 48zm304-48V79c0-26.5-21.5-48-48-48h-96c-26.5 0-48 21.5-48 48v352c0 26.5 21.5 48 48 48h96c26.5 0 48-21.5 48-48z"></path></svg>

- For this, I’m going to have you calculate “eye-ball” 95% confidence intervals, i.e. using 2 instead of 1.96 for the critical value.
- Therefore, the formula is:
  - Lower bound = estimate – 2&ast;SE, Upper bound = estimate + 2&ast;SE
- Remember order of operations -> multiply SE by 2 first!
- The short quiz “Confidence Interval Calculation” on Canvas.

---
class: segue-yellow
background-image: url("assets/TulaneLogo.svg")
background-size: 20%
background-position: 95% 95%

# Hypothesis Testing

---
## Hypothesis testing

- In addition to calculating confidence intervals, we often do hypothesis testing.
- Mostly, we test to see if our estimates are statistically significantly different from zero.
  - This is, are we reasonably sure that the true value, which we estimated, is different from zero?
- Different from zero is useful to test because if it is different from zero, then it implies that there is likely an effect or a difference.
- If an estimate is not statistically significantly different from zero, we don’t have enough statistical evidence to claim that there is an effect.

---
## Hypothesis testing: 10%, 5%, and 1% levels

- We typically test for statistical significance at the following levels:
  - 10%, which corresponds to a 90% confidence interval,
  - 5%, which corresponds to a 95% confidence interval,
  - 1%, which corresponds to a 99% confidence interval.
- The 10%, 5%, and 1% here refer to the amount of what is called “Type 1 error”, which can be interpreted as a false positive rate (finding an effect that does not exist).
  - Under 10% (5%, 1%), you will find an effect (difference from zero) that does not actually exist 10% (5%, 1%) of the time.

---
## Balancing Type 1 and Type 2 Error

- Statistics tries to balance to types of error:
- Type 1 error -> “false positive”
  - E.g., finding an effect where there is actually no effect.
  - A positive test result when really the person is negative.
- Type 2 error -> “false negative”
  - E.g., finding no effect (not statistically different from zero) when really there is an effect.
  - A negative test result when really the person is positive.

---
## Balancing Type 1 and Type 2 Error

If we decrease the level that we test at (e.g., from 5% to 1%, which would be the same as moving from a 95%
confidence interval to a 99% confidence interval) then we decrease the probability of making Type 1 Errors
(fewer false positives) but we increase the probability of making Type 2 Errors (more false negatives).

---

.center[![image](assets/pregnant.jpg)]

---
## Which Type of error is this concept minimizing?

> It is better that ten guilty persons escape than that one innocent suffer. - William Blackstone

- In the context of the criminal justice system
  - Type I error occurs when an innocent person is wrongfully convicted
  - Type II error, or "false negative," occurs when a guilty person is wrongfully acquitted

---
## Hypothesis testing formula

- To do a hypothesis test, at any level (10%, 5%, 1%), to see if our estimate is statistically different from zero, we first calculate a t-statistic as follows:

$$
t = \frac{\text{estimate} - 0}{\text{SE}}
$$

- The formula for hypothesis testing is:
  - Test statistic = (estimate – null hypothesis value) / standard error
  - The null hypothesis value is usually zero, but it can be any value.
  - The standard error is the standard error of the estimate.

--
- E.g., if the coefficient is 0.2 and the standard error is 0.1, the t-statistic is 2.
- E.g., if the coefficient is -2 and the standard error is 2, the t-statistic is -1.

---
## Hypothesis testing formula

- Once we have our t-statistic, we compare it to a critical value.
- These are the same critical values used to create confidence intervals.
- The critical values are…
  - 1.645 for a test at the 10% level of significance (90% confidence interval)
  - 1.96 for a test at the 5% level of significance (95% confidence interval)
  - 2.576 for a test at the 1% level of significance (99% confidence interval)
- If our critical value is, in **absolute value**, greater than that critical value, then it is at least significant at that level.
- `$|t-statistic| \ge \text{critical value}$`
  - The | | means “take the absolute value of”
  - So, if your t-statistic is negative (i.e. your estimate is negative), then just multiply it by -1 to make it positive.

---
## Hypothesis testing Example

- 1.645 for a test at the 10% level of significance
  - 1.96 for a test at the 5% level of significance 
  - 2.576 for a test at the 1% level of significance 
- `$|t-statistic| \ge \text{critical value}$`
- Suppose our t-statistic is 2.2.
  - It’s greater in absolute value than 1.645 and 1.96, but not 2.576.
  - Therefore it is significant at the 5% level, but not the 1% level.
- Suppose our t-statistic is -1.7.
  - It’s greater in absolute value than 1.645, but not 1.96 or 2.576.
  - Therefore it is significant at the 10% level, but not the 5% or 1% levels.

---
## Hypothesis testing Example

.pull-left[
- Instead of using 1.96 as the critical value to test at the 5% level, use 2.
- The “eye-ball” t-test at the 5% level is just dividing the coefficient by the standard error and seeing if that t-statistic is greater than 2 in absolute value.
- You can often do this just by looking at coefficient estimates with their standard errors in tables.
- E.g.,   0.038  (0.017)
- I can see that that’s bigger than 2.
]
.pull-right[
  <img src="02-class2_files/figure-html/hot-tip2-1.png" width="100%" style="display: block; margin: auto;" />
]

---
## After Class Activity – t-statistics and hypothesis testing  <svg viewBox="0 0 448 512" style="height:1em;display:inline-block;position:fixed;top:10;right:10;" xmlns="http://www.w3.org/2000/svg">  <path d="M144 479H48c-26.5 0-48-21.5-48-48V79c0-26.5 21.5-48 48-48h96c26.5 0 48 21.5 48 48v352c0 26.5-21.5 48-48 48zm304-48V79c0-26.5-21.5-48-48-48h-96c-26.5 0-48 21.5-48 48v352c0 26.5 21.5 48 48 48h96c26.5 0 48-21.5 48-48z"></path></svg>

- Let’s take a break from lecture to calculate some intervals.
- For this, I’m going to have you do hypothesis tests (“t-tests”) using the “eye-ball” method, i.e. using 2 instead of 1.96 for the critical value.
- Therefore, the formula is:
  - `$|t-statistic| \ge 2$`

---
## What do the &ast;s beside the estimates in tables mean?

.pull-left[
  - Usually statistical tables have notes under them that detail what &ast;, &ast;&ast;, &ast;&ast;&ast;
  - More &ast;s means more statistically significant -> we are even more sure that there is an effect (i.e. that the estimate is different). The risk of Type 1 error (false positive) is lower as significance increases.

]

.pull-right[
  <img src="assets/table-2.png" alt="image" width="70%" height="auto">
]

---
## What do the &ast;s beside the estimates in tables mean?

- Most tables use the following convention, but check the table notes to be sure.
- No &ast;s means not statistically significant at the 10% level (or any more stringent levels: 5%, 1%, etc.).
- &ast; means statistically significantly different from zero at the 10% level.
  - Or, zero does not fall into the 90% confidence interval
- &ast;&ast; means statistically significantly different from zero at the 5% level.
  - Or, zero does not fall into the 95% confidence interval
- &ast;&ast;&ast; means statistically significantly different from zero at the 1% level.
  - Or, zero does not fall into the 99% confidence interval

---
## What do the &ast;s beside the estimates in tables mean?

- Note: anything significant at the 1% level (&ast;&ast;&ast;) is also significant at the 5% level (&ast;&ast;) and the 10% level (&ast;).
- Similarly, anything significant at the 5% level (&ast;&ast;) is also significant at the 10% level (&ast;).
- Testing at the 5% level is the most common benchmark of statistical significance used.
- So, when researchers say something is statistically significant, they usually mean that it’s statistically significant at at least the 5% level.
- The 1% level is the strongest conventional level, although you can test at any level (e.g., some researchers test at the 0.1% level).

---
## What do the &ast;s beside the estimates in tables mean?

- You can use the &ast; system to quickly see how significant estimates are.
- This avoids you having to do more time-intensive ways at gauging the statistical significant of the estimates, such as:
  - Calculating a t-statistic (coefficient divided by standard error) and seeing if it’s greater than two (which would mean its significant at the 5% level).
  - Calculating a confidence interval.
- Again, just be sure to check the table notes to be sure you are interpreting the &ast; system correctly.

---
## Other table conventions – t-stats in ()

- The majority of social sciences, outside of usually psychology, tend to present statistical results the way I detailed here:
  - Estimate, with standard errors in () underneath
- However, some fields or older papers put **t-statistics** underneath the estimates.
.pull-left[
- So instead of 2.0 (1.0)
]
.pull-right[
- They would have: 2.0 (2.0)
]

- Check the table notes so you know what is in the ()!

---
## Other table conventions – p-value in ()

- The majority of social sciences, outside of usually psychology, tend to present statistical results the way I detailed here:
- Estimate, with standard errors in () underneath
- However, some fields or older papers put p-values underneath the estimates.

.pull-left[
- So instead of 1.96 (1.0)
]

.pull-right[
- They would have: 1.96 (0.05) or 1.96 `$[0.05]$`
]

- The p-value is significance level, so 0.05 means significant at the 5% level.
- A p-value of 0.01 means significant at the 1% level, etc.

---
class: segue-yellow
background-image: url("assets/TulaneLogo.svg")
background-size: 20%
background-position: 95% 95%

# Difference-in-differences R Code: Optional <svg viewBox="0 0 640 512" style="height:1em;display:inline-block;position:fixed;top:10;right:10;" xmlns="http://www.w3.org/2000/svg">  <path d="M278.9 511.5l-61-17.7c-6.4-1.8-10-8.5-8.2-14.9L346.2 8.7c1.8-6.4 8.5-10 14.9-8.2l61 17.7c6.4 1.8 10 8.5 8.2 14.9L293.8 503.3c-1.9 6.4-8.5 10.1-14.9 8.2zm-114-112.2l43.5-46.4c4.6-4.9 4.3-12.7-.8-17.2L117 256l90.6-79.7c5.1-4.5 5.5-12.3.8-17.2l-43.5-46.4c-4.5-4.8-12.1-5.1-17-.5L3.8 247.2c-5.1 4.7-5.1 12.8 0 17.5l144.1 135.1c4.9 4.6 12.5 4.4 17-.5zm327.2.6l144.1-135.1c5.1-4.7 5.1-12.8 0-17.5L492.1 112.1c-4.8-4.5-12.4-4.3-17 .5L431.6 159c-4.6 4.9-4.3 12.7.8 17.2L523 256l-90.6 79.7c-5.1 4.5-5.5 12.3-.8 17.2l43.5 46.4c4.5 4.9 12.1 5.1 17 .6z"></path></svg>

---
## Difference-in-differences R Code

- This tutorial is base on David Card and Alan B. Krueger (1994) "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania" *American Economic Review* 84(4): 772-793.

---
## Difference-in-differences R Code

- First, we need to load the data and install the packages we will use.

- The dataset could be downloaded from [here](https://raw.githubusercontent.com/hhadah/urban-econ/main/slides/03-week3/02-thur/assets/minwage_short.csv)

``` r
# Load the data
library(haven)
library(tidyverse)
library(dplyr)
library(readr)
library(broom)

urlfile = "https://raw.githubusercontent.com/hhadah/urban-econ/main/slides/03-week3/02-thur/assets/minwage_short.csv"

# Load the data
data <- read_csv(url(urlfile))
names(data)
```

```
##  [1] "state"          "empft1"         "emppt1"         "nmgrs1"        
##  [5] "empft2"         "emppt2"         "nmgrs2"         "store_id"      
##  [9] "emptot1"        "emptot2"        "store_has_data"
```

---
## Difference-in-differences R Code

- Filter missing data

``` r
min_wage_data <- data %>%
  filter(!(is.na(emptot1)), !(is.na(emptot2)))
names(min_wage_data)
```

- Create a new variable `treat` to indicate the treatment group

``` r
min_wage_data <- min_wage_data %>%
  mutate(nj = state == 1)
```

---
## Do Difference-in-differences by hand

``` r
nj_before <- min_wage_data %>%
  filter(nj == 1) %>%
  summarise(mean_empt1 = mean(emptot1), sd_empt1 = sd(emptot1))
nj_after <- min_wage_data %>%
  filter(nj == 0) %>%
  summarise(mean_empt1 = mean(emptot1), sd_empt1 = sd(emptot1))
pa_before <- min_wage_data %>%
  filter(nj == 1) %>%
  summarise(mean_empt2 = mean(emptot2), sd_empt2 = sd(emptot2))
pa_after <- min_wage_data %>%
  filter(nj == 0) %>%
  summarise(mean_empt2 = mean(emptot2), sd_empt2 = sd(emptot2))
```

---
## Do Difference-in-differences by hand

.pull-left[

``` r
nj_after
```

```
## # A tibble: 1 x 2
##   mean_empt1 sd_empt1
##        <dbl>    <dbl>
## 1       23.4     12.0
```

``` r
nj_before
```

```
## # A tibble: 1 x 2
##   mean_empt1 sd_empt1
##        <dbl>    <dbl>
## 1       20.4     9.21
```
]

.pull-right[

``` r
pa_after
```

```
## # A tibble: 1 x 2
##   mean_empt2 sd_empt2
##        <dbl>    <dbl>
## 1       21.1     8.38
```

``` r
pa_before
```

```
## # A tibble: 1 x 2
##   mean_empt2 sd_empt2
##        <dbl>    <dbl>
## 1       20.9     9.38
```
]

``` r
nj_after - nj_before - (pa_after - pa_before)
```

```
##   mean_empt1 sd_empt1
## 1       2.75 3.803228
```

---
## Do Difference-in-differences by hand

### Now, change over time (after - before)

``` r
min_wage_data <- min_wage_data %>%
  mutate(demp = (emptot2 - emptot1))
summary_demp <- min_wage_data %>%
  group_by(state) %>%
  summarise(mean = mean(demp), sd = sd(demp))
summary_demp
```

```
## # A tibble: 2 x 3
##   state   mean    sd
##   <dbl>  <dbl> <dbl>
## 1     0 -2.28  10.9 
## 2     1  0.467  8.45
```

---
## Difference-in-differences in regression form

``` r
DD_Regression <- lm(demp ~ nj, data = min_wage_data)
tidy(DD_Regression)
```

```
## # A tibble: 2 x 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)    -2.28      1.04     -2.21  0.0280
## 2 njTRUE          2.75      1.15      2.38  0.0177
```