Chapter 21 Simulations as evidence

We began this book with an acknowledgement that simulation is fraught with the potential for misuse: simulations are doomed to succeed. We close by reiterating this point, and also disussing several ways researchers might argue for their simulations being more useful than typical.

In particular a researcher might do any of the following

  1. Use extensive multi-factor simulations
  2. Beat them at their own game. Generate simulations based on prior literature.
  3. Build calibrated simulations

21.1 Use extensive multi-factor simulations

“If a single simulation is not convincing, use more of them,” is one principle a reseacher might take. By conducting extensive multifactor simulations, once can explore a large space of possible data generating scenarios. If, across the full range of scenarios, a general story bears out, then perhaps that will be more convincing than a narrower range.

Of course, the critic will claim that some aspect that is not varying is the real culprit. If this is unrealistic, then the findings, across the board, may be less relevant. Thus, pick the factors one varies with care.

21.2 Beat them at their own game

If a prior paper uses a simulation to make a case, one approach is to replicate that simulation, adding in the new estimator one wants to evaluate. This makes it (more) clear that you are not fishing: you are using something established in the literature as a published benchmark. By constraining oneself to published simulations, one has less wiggle room to cherry pick a data generating process that works the way you want.

21.3 Calibrated simulations

A practice in increasing vogue is to generate calibrated simulations. These are simulations tailored to a specific applied contexts, with the results of the simulation studies designed to more narrowly inform what assumptions and structures are necessary in order to make progress in that context.

Often these simulations are built out of existing data. For example, one might sample, with replacement, from the covariate distribution of an actual dataset so that the distribution of covariates is authentic in how the covariates are distributed and, more importantly, how they co-relate.

The problem with this approach is one still needs to generate a ground truth to assess how well the estimators work in practice. It is very easy to accidently put a very simple model in place for this component, thus making a calibrated simulation quite naive in the very way that counts.

The potential outcomes framework provides a natural way of generating calibrated simulations (Kern et al. 2014). Calbirated simulations are simulations tailored to specific real-world scenarios, to maximize their face validity as being representative of something we would see in practice. One way to generate a calibrated simulation is to use an existing datasets to generate plausible scenarios.

One way to do this with potential outcomes is to take an existing randomized experiment or observational study and impute all the missing potential outcomes under some specific scheme. This fully defines the sample of interest and thus any target parameters, such as a measure of heterogeneity, are then known. We will then synthetically, and repeatedly, randomize and ``observe’’ outcomes to be analyzed with the methods we are testing. We could also resample from our dataset to generate datasets of different size, or to have a superpopulation target as our estimand.

The key feature is the imputation step. One baseline method one can use is to generate a matched-pairs dataset by, for each unit, finding a close match given all the demographic and other covariate information of the sample. We then use the matched unit as the imputed potential outcome.
By doing this (with replacement) for all units we can generate a fully imputed dataset which we then use as our population. This can preserve complex relationships in the data that are not model dependent. In particular, if outcomes tend to be coarsely defined (e.g., on an integer scale) or have specific clumps (such as zero-inflation or rounding), this structure will be preserved.

One concern with this approach is the noise in the matching will in general dilute the structure of the treatment effect.
This is akin to measurement error diluting found relationships in linear models. We can then sharpen these relationships towards a given model by first imputing missing outcomes using a specified model, and then matching on all units including the imputed potential outcome. This is not a data analysis strategy, but instead a method of generating synthetic data that both has a given structure of interest and also remains faithful to the idiosyncrasies of an actual dataset.

A second approach that allows for varying the level of a systematic effect is to specify the treatment effect model and impute treatment outcomes for all control units.
Then the complex structure between covariates and \(Y(0)\) would be perfectly preserved. Unfortunately, this would give 0 idiosyncratic treatment variation (unit-to-unit variation in the \(\tau_i\) that is not explained by a model). To add in idiosyncratic variation we would then need to generate a distribution of perturbations and add these to the imputed outcomes just as an error term in a regression model.

Regardless, once a fully observed sample has been obtained we can investigate several aspects of our estimators as above.

References

Kern, Holger L., Elizabeth A. Stuart, Jennifer Hill, and Donald P. Green. 2014. Assessing Methods for Generalizing Experimental Impact Estimates to Target Populations.” Journal of Research on Educational Effectiveness 9 (1): 103–27. https://doi.org/10.1080/19345747.2015.1060282.