Chapter 7 Data analysis procedures

In the abstract, a function that implements an estimation procedure should have the following form:

estimate <- function(data) {

  # calculations/model-fitting/estimation procedures
  
  return(estimates)
}

The function takes a data set as input, fits a model or otherwise calculates an estimate, possibly with associated standard errors and so forth, and produces as output these estimates. In principle, you should be able to run your function on real data as well as simulated.

The estimates could be point-estimates of parameters, standard errors, confidence intervals, etc. Depending on the research question, this function might involve a combination of several procedures (e.g., a diagnostic test for heteroskedasticity, followed by the conventional formula or heteroskedasticity-robust formula for standard errors). Also depending on the research question, we might need to create several functions that implement different estimation procedures to be compared.

In Chapter 5, for example, we saw different functions for some of the methods Brown and Forsythe considered for heteroskedastic ANOVA.

7.1 Validating Estimation Procedures

Just as with the data-generating function, it is important to verify the accuracy of the estimation functions. For our Welch test, we can actually check our results against the built-in oneway.test function. Let’s do that with a fresh set of data:

sim_data <- generate_data(mu = mu, sigma_sq = sigma_sq,
                          sample_size = sample_size)

aov_results <- oneway.test(x ~ factor(group),
                           data = sim_data, 
                           var.equal = FALSE)
aov_results

## 
##  One-way analysis of means (not assuming
##  equal variances)
## 
## data:  x and factor(group)
## F = 58.782, num df = 3.000, denom df =
## 3.494, p-value = 0.001817

Welch_results <- Welch_F(sim_data)
all.equal(aov_results$p.value, Welch_results)

## [1] TRUE

We use all.equal() because it will check equality up to a tolerance in R, which can avoid some weird floating point errors due to rounding.

7.2 Checking via simulation

If your estimation procedure truly is new, how would you check it? Well, one obvious answer is simulation!

In principle, for large samples and data generated under the assumptions required by your new procedure, you should have a fairly good sense that your estimation procedures should work. It is often the case that as you design your simulation, and then start analyzing the results, you will find your estimators are really not working as planned.

Such surprises will usually be due to (at least) three factors: you did not implement your method correctly, your method is not yet a good idea in the first place, or you do not yet understand something important about how your method works. When faced with poor performance you thus will debug your code, revise your method, and do some serious thinking. Ideally this will eventually lead you to a deeper understanding of a method that is a better idea in general, and correctly implemented in all likelihood.

For example, in one research project Luke and other co-authors were working on a way to improve Instrumental Variable (IV) estimation using post-stratification. The idea is to group units based on a covariate that predicts compliance status, and then estimate within each group; hopefully this would improve overall estimation.

In the first simulation, the estimates were full of NAs and odd results because we failed to properly account for what happens when the number of compliers was estimated to be zero. That was table stakes: after repairing that, we still found odd behavior and serious and unexpected bias, which turned out to be due to failing to implement the averaging of the groups step correctly. We fixed the estimator again and re-ran, and found that even when we had a variable that was almost perfectly predictive of compliance, gains were still surprisingly minimal. Eventually we understood that the groups with very few compliers were actually so unstable that they ruined the overall estimate. These results inspired us to introduce other estimators that dropped or down-weighted such strata, which gave our paper a deeper purpose and contribution.

Simulation is an iterative process. It is to help you, the researcher, learn about your estimators so you can find a way forward with your work. What you learn then feeds back to the prior research, and you have a cycle that you eventually step off of, if you want to finish your paper. But do not expect it to be a single, well-defined, trajectory.

7.3 Including Multiple estimation procedures

In Section 6.3 we introduced a case study of evaluating different procedures for estimating treatment impacts in a cluster randomized trial. As a point of design, we generally recommend writing different functions for each estimation method one is planning on evaluating. This makes it easier to plug into play different methods as desired, and also helps generate a code base that is flexible and useful for other purposes. It also, continuing our usual mantra, makes debugging easier: you can focus attention on one thing at a time, and worry less about how errors in one area might propagate to others.

For the cluster RCT context, we use two libraries, the lme4 package (for multilevel modeling), the arm package (which gives us nice access to standard errors, with se.fixef()), and lmerTest (which gives us \(p\)-values for multilevel modeling). We also need the estimatr package to get robust SEs with lm_robust. This use of different packages for different estimators is quite typical: in many simulations, many of the estimation approaches being considered are usually taken from the literature, and if you are lucky this means you can simply use a package that implements those methods.

We load our libraries at the top of our code:

library( lme4 )
library( arm )
library( lmerTest )
library( estimatr )

Our three analysis functions are then Multilevel Regression (MLM):

analysis_MLM <- function( dat ) {
  M1 = lmer( Yobs ~ 1 + Z + (1|sid),
             data=dat )
  est = fixef( M1 )[["Z"]]
  se = se.fixef( M1 )[["Z"]]
  pv = summary(M1)$coefficients["Z",5]
  tibble( ATE_hat = est, SE_hat = se, p_value = pv )
}

Linear Regression with Cluster-Robust Standard Errors (LM):

analysis_OLS <- function( dat ) {
  M2 <- lm_robust( Yobs ~ 1 + Z, 
            data=dat, clusters=sid )
  est <- M2$coefficients[["Z"]]
  se  <- M2$std.error[["Z"]]
  pv <- M2$p.value[["Z"]]
  tibble( ATE_hat = est, SE_hat = se, p_value = pv )
}

and Aggregate data (Agg):

analysis_agg <- function( dat ) {
  datagg <- 
    dat %>% 
    group_by( sid, Z ) %>%
    summarise( 
      Ybar = mean( Yobs ),
      n = n() 
    )
  
  stopifnot( nrow( datagg ) == length(unique(dat$sid) ) )
  
  M3 <- lm_robust( Ybar ~ 1 + Z, 
                   data=datagg, se_type = "HC2" )
  est <- M3$coefficients[["Z"]]
  se <- M3$std.error[["Z"]]
  pv <- M3$p.value[["Z"]]
  tibble( ATE_hat = est, SE_hat = se, p_value = pv )
}

Note the stopifnot command: putting assert statements in your code like this is a good way to guarantee you are not introducing weird and hard-to-track errors in your code. For example, R likes to recycle vectors to make them the right length; if you gave it a wrong length in error, this can be a brutal error to discover. The stopifnot statements halt your code as soon as something goes wrong, rather than letting that initial wrongness flow on to further work, showing up in odd results that you don’t understand later on. See Section 17.2 for more.

All of our methods give output in the similar format:

analysis_MLM( dat )

## # A tibble: 1 × 3
##   ATE_hat SE_hat p_value
##     <dbl>  <dbl>   <dbl>
## 1   0.330  0.212   0.130

analysis_OLS( dat )

## # A tibble: 1 × 3
##   ATE_hat SE_hat p_value
##     <dbl>  <dbl>   <dbl>
## 1   0.349  0.217   0.121

analysis_agg( dat )

## # A tibble: 1 × 3
##   ATE_hat SE_hat p_value
##     <dbl>  <dbl>   <dbl>
## 1   0.325  0.211   0.134

This will allow us to make our simulation, for each iteration, call each method in turn on the same dataset, stack the results into a small table, and return that result. This will in turn get stacked to make one giant table of results, which makes evaluating performance quite easy. We will see this in the next chapter.

7.4 Exercises

7.4.1 More Welch and adding the BFF test

Let’s continue to explore and tweak the simulation code we have developed to replicate the results of Brown and Forsythe (1974).

Write a function that implements the Brown-Forsythe F*-test (the BFF* test!) as described on p. 130 of Brown and Forsythe (1974). Call it on a sample dataset to check it.

BF_F <- function(x_bar, s_sq, n, g) {
  
  # fill in the guts here
  
  return(pval)
}

Try calling your BF_F function on a variety of datasets of different sizes and shapes, to make sure it works. What kinds of datasets should you test out?

7.4.2 More estimators for Cluster Randomized Trials

Sometimes you might want to consider two versions of an estimator. For example, in our cluster RCT code we used robust standard errors for the linear model estimator. Say we also want to include naive standard error estimates that we get out of the lm call.

Extend the OLS call to be

analysis_OLS <- function( dat, robustSE = TRUE ) {

and have the code inside calculate SEs based on the flag. Then modify the analyze_data() to include both approaches (you will have to call analysis_OLS twice).

Efficiency-wise, estimating the OLS twice might not be ideal, but clarity-wise it might be considered helpful. Articulate two reasons for the design choice of implementing the two OLS calls separately, and articulate two reasons for instead having the analysis_OLS method generate both standard errors internally in a single call.