Day 9

class: center, middle, inverse, title-slide

# Day 9
## Mediation, moderation, and generalized linear models
### Michael W. Kearney📊 School of Journalism Informatics Institute University of Missouri
### <table style="border-style:none;padding-top:30px;" class=".table">
<tr>
<th style="padding-right:75px!important">
<a href="https://twitter.com/kearneymw"> </a>
</th>
<th style="padding-left:75px!important">
<a href="https://github.com/mkearney"> </a>
</th>
</tr>
<tr style="background-color:#fff">
<th style="padding-right:75px!important">
<a href="https://twitter.com/kearneymw"> @kearneymw </a>
</th>
<th style="padding-left:75px!important">
<a href="https://github.com/mkearney"> @mkearney </a>
</th>
</tr>
</table>

---

class: inverse, center, middle

## Generalized linear models

---

## Agenda
+ Mediation/Moderation
   - Moderation (interactions)
   - Mediation
   - Path analysis
+ Generalized models
   - Logistic regression
+ mturk experiment
   - AP vs Twitter vs Facebook

---
class: inverse, center, middle

## Moderation

---

## Moderation

Extent to which the effect of a predictor (**X**) on an outcome (**`Y`**) **depends** on a third variable (**W**), called a **moderator**

---

## Y on X

Moderation: the effect of X on Y depends on another variable

---

## Y on X *through* W

---

## Moderation

+ Same as **interaction**
+ **"Depends"** interpretations are reversible, e.g.,
   - The effect of political knowledge [on participation] depends on gender
   - The effect of gender [on participation] depends on political knowledge

---

## Calculating moderation

+ Using the linear model, we simply add an **interaction term**

``` r
lm(y ~ x + w + x * w, data = data)
```

+ For a shortcut that produces the same result, only enter the interaction

``` r
lm(y ~ x * w, data)
```

---

---

## Moderation

The effect of **X** on **Y** is **moderated** if its existence, size, or direction varies with (depends on) **W**

Moderation helps us understand...

+ conditions that enable, augment, suppress, or invert the effect
+ for whom/what the effect is large or small, positive or negative, present or absent, etc.
+ when the effect is large or small, positive or negative, present or absent, etc.

---

## Moderation example

Example context: studying the effect of newspaper advertisements on product revenue

---

## Quick write (60s)

Using variables below, write two hypotheses (one num; one cat) and share with your neighbor

+ **DV/Y**: product revenue (dollars) **`revenue`**
+ **IV/X**: newspaper advertisements (count) 
   - **`ads_num`**: Number of ads
   - **`ads_cat`**: Yes ads or no ads
+ **Moderator/W**: newspaper quality 
   - **`qual_cat`**: low, medium, high
   - **`qual_num`**: value of company (dollars)

---

**Hypothesis**:

+ The effect of **newspaper advertisements** on **product revenue** depends on 
**newspaper quality** 
   - Ads will increase revenue when in high quality newspapers
   - Ads will not increase revenue when in low quality newspapers

---

---

---

## Regression

```r
## regression model
m1 <- lm(revenue ~ quality_num * ads_num, data = news_data)
```

```r
summary(m1)
#> 
#> Call:
#> lm(formula = revenue ~ quality_num * ads_num, data = news_data)
#> 
#> Residuals:
#> Min 1Q Median 3Q Max 
#> -22.15 -9.99 -3.85 7.75 61.36 
#> 
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|) 
#> (Intercept) 7.359843 3.595617 2.05 0.043 * 
#> quality_num 0.562147 0.056960 9.87 2.8e-16 ***
#> ads_num 0.222988 0.034351 6.49 3.7e-09 ***
#> quality_num:ads_num -0.001524 0.000335 -4.55 1.6e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 15.1 on 96 degrees of freedom
#> Multiple R-squared: 0.738,	Adjusted R-squared: 0.73 
#> F-statistic: 90 on 3 and 96 DF, p-value: <2e-16
```

---

## ANOVA

```r
## anova model
m2 <- aov(revenue ~ quality_cat * ads_cat, data = news_data)
summary(m2)
#> Df Sum Sq Mean Sq F value Pr(>F) 
#> quality_cat 2 49427 24713 91.47 < 2e-16 ***
#> ads_cat 1 6048 6048 22.38 0.0000079 ***
#> quality_cat:ads_cat 2 2264 1132 4.19 0.018 * 
#> Residuals 94 25396 270 
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```r
## view all pairwise comparisons
(comparisons <- TukeyHSD(m2))
```

---

``` r
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = revenue ~ quality_cat * ads_cat, data = news_data)

$quality_cat
                    diff      lwr      upr p adj
 LoQual- HiQual -54.3119 -63.8771 -44.7468     0
 MdQual- HiQual -27.8877 -37.4529 -18.3225     0
 MdQual- LoQual  26.4242  16.7880  36.0605     0

$ads_cat
                   diff     lwr     upr    p adj
YesAds -NoAds   13.2312 6.70409 19.7584 0.000115

$`quality_cat:ads_cat`
                                   diff     lwr     upr p adj
 LoQual:NoAds  - HiQual:NoAds   -58.222 -79.805 -36.639 0.000
 MdQual:NoAds  - HiQual:NoAds   -35.137 -57.845 -12.429 0.000
 HiQual:YesAds - HiQual:NoAds     2.476 -19.037  23.989 0.999
 LoQual:YesAds - HiQual:NoAds   -25.500 -53.109   2.109 0.087
 MdQual:YesAds - HiQual:NoAds   -15.979 -38.871   6.913 0.333
 MdQual:NoAds  - LoQual:NoAds    23.085   8.279  37.891 0.000
 HiQual:YesAds - LoQual:NoAds    60.698  47.800  73.597 0.000
 LoQual:YesAds - LoQual:NoAds    32.722  11.139  54.305 0.000
 MdQual:YesAds - LoQual:NoAds    42.243  27.156  57.330 0.000
 HiQual:YesAds - MdQual:NoAds    37.613  22.910  52.317 0.000
 LoQual:YesAds - MdQual:NoAds     9.637 -13.071  32.345 0.819
 MdQual:YesAds - MdQual:NoAds    19.158   2.501  35.815 0.015
 LoQual:YesAds - HiQual:YesAds  -27.976 -49.489  -6.463 0.004
 MdQual:YesAds - HiQual:YesAds  -18.455 -33.442  -3.469 0.007
 MdQual:YesAds - LoQual:YesAds    9.521 -13.371  32.413 0.831
```

---

---

## Note

For the plot on the following slide...

+ lines represent whether the number of ads (associated  with each observation) is greater than the mean number of ads

``` r
ads_num > mean(ads_num)
```

+ How do we interpret the following plot? What's the key takeaway?

---

---

## Results

+ Ads **increased** revenues **when in high quality** newspapers
+ Ads had **no effect** on revenues **when in medium quality** newspapers
+ Ads **decreased** revenues **when in low quality** newspapers

---
class: inverse, center, middle

## Mediation

---

## Mediation

Mediation analysis links a presumed cause **X** to a presumed effect **Y**, at least in part, through an intermediary or **mediator** variable **M**

+ Key difference with moderation: **X** has an effect on both **M(W)** and **Y**
   - Some variables (race, age, etc.) can't be mediators

---

## Mediation

---

## Mediation

+ **X** to **M** to **Y** is a causal chain of events

+ A mediator variable can be a psychological state, a cognitive process, an 
observed behavior, an affective response, a biological change, or any other
conceivable *mechanism* variable through which **X** exerts an effect on **Y**

+ But the effect must occur causally between **X** and **Y**

---

---

## Baron & Kenny

Baron & Kenny describe three steps, using Sobel's test

1. Significant relationship between x and y
2. Significant relationship between x and w (a)
3. Significant relationship between w and y (b)

> Baron & Kenny's steps are too conservative

---

---

## Mediation

+ Evidence that **ab** is different from zero is consistent with mediation
+ Evidence that path **c** is different from zero is **not a requirement of modern mediation analysis**
+ Correlation between **X** and **Y** is neither sufficient nor necessary to claim that **X** affects **Y**
+ Test the interaction via bootstrapping and confidence intervals

---
class: inverse, center, middle

## Mediation & Moderation

---

## Mediation & Moderation

+ We can combine moderation and mediation analysis
   - "Moderated mediation" or "mediated moderation"

+ Hayes proposes "conditional process analysis"

---

## Conditional process analysis

There's a better way to talk about mediation and moderation

+ **Direct effects** describe the effect of a predictor **on** the outcome
+ **Indirect effects** describe the effect of a predictor **through** another variable [on the outcome]
+ **Conditional effects** describe **moderated** direct and indirect effects

---

## PROCESS

+ **Conditional process analysis** is modeling strategy with the goal of describing the conditional nature of the mechanism(s) by which a variable transmits its effect on another
   - testing hypotheses about such contingent effects.
+ Macros in SPSS and SAS, maybe R?

---

## Path analysis

Using simultaneously equations, it's possible to estimate a mediation model via path analysis. This
is the same framework used for strutural equation modelling

In R, the lavaan package makes path analysis fairly easy...

---

## lavaan

Does the effect of advertisements on revenue occur **through** newspaper quality?

```r
## load lavaan package
library(lavaan)

## specify model
model_syntax <- paste0(
 'quality_num ~ a*ads_num\n',
 'revenue ~ b*quality_num + c*ads_num\n',
 'ind := a * b\n',
 'c2 := c - ind')

## estimate model
m3 <- sem(model_syntax, data = news_data)

## print summary
summary(m3)
```

---

``` r
#> lavaan 0.6-3 ended normally after 16 iterations
#>
#>   Optimization method                           NLMINB
#>   Number of free parameters                          5
#>   Number of observations                           100
#>
#>   Estimator                                         ML
#>   Model Fit Test Statistic                       0.000
#>   Degrees of freedom                                 0
#>
#> Parameter Estimates:
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#>   Standard Errors                             Standard
#>
#> Regressions:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>   quality_num ~                                       
#>     ads_num    (a)    0.306    0.043    7.196    0.000
#>   revenue ~                                           
#>     quality_nm (b)    0.377    0.043    8.757    0.000
#>     ads_num    (c)    0.099    0.023    4.383    0.000
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>    .quality_num    1431.484  202.442    7.071    0.000
#>    .revenue         265.102   37.491    7.071    0.000
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  z-value  P(>|z|)
#>     ind               0.115    0.021    5.560    0.000
#>     c2               -0.016    0.036   -0.460    0.645
```

---
class: inverse, center, middle

## Generalized models

---

## Non-normal outcomes

**OLS assumption**: residuals (errors) are normally distributed

+ What happens if the outcome is not normally distributed? 
+ What are some outcome variables that wouldn't follow a normal (bell curve) distribution?

---
class: inverse, center, middle

## Logistic regression

---

## Logistic regression

+ Residuals for dichotomous outcomes (only two levels) aren't normal

```r
## flip 10 coins and track whether heads or tails
cf <- factor(sample(c("heads", "tails"), 10, replace = TRUE), levels = c("heads", "tails"))

## conver to integer with tails = 1 and heads = 0, guess the mean
residuals <- as.integer(cf) - mean(as.integer(cf))

## plot residuals
ggplot(data.frame(residuals), aes(x = residuals)) + geom_bar() + 
  ggsave("img/binary-residuals.png", width = 7, height = 5, units = "in")
```

---

Coin flip residuals

---

## Logistic regression

+ For dichotomous outcomes, it's more accurate to assume a **binomial distribution** of residuals
+ **Logistic regression**: uses maximum likelihood and the binomial probability equation to calculate coefficients
   - More theoretically sound
   - More accurate and more powerful
   - Less interpretable

---
class: inverse, center, middle

## `glm()`

---

## Generalized models

All generalized models have two important features to be aware of

- Calculations are done via **maximimum likelihood** (estimator leveraging calculus and probability theory)
   - Biased for small samples
   - Desirable asymptotic qualities
- Assumed outcome distributions can be normal or non-normal

---

## Distributions/GLMs

+ **Dichotomous**: binomial (logistic) 
   - `glm(f, data = data, family = binomial)`
+ **Count**: poisson
   - `glm(f, data = data, family = poisson)`
+ **Overdispersed count**: negative binomial
   - `glm.nb(f, data = data)`

---

## Quasi-methods

+ Sometimes data do not technically meet strict requirements of family, e.g., 
   - probabilities instead of 0/1
   - Average counts instead of raw counts
+ **Quasi.*** versions are often available and work well 
   - `glm(f, data = data, family = quasipoisson)`
   - `glm(f, data = data, family = quasibinomial)`