Effect-size-definitions.Rmd
The SingleCaseES package provides R functions for calculating basic, within-case effect size indices for single-case designs, including several non-overlap measures and parametric effect size measures, and for estimating the gradual effects model developed by Swan & Pustejovsky (2018). Estimation procedures for standard errors and confidence intervals are provided for the subset of effect sizes indices with known sampling distributions. This vignette covers the mathematical definitions of the basic non-overlap and parametric effect size measures, along with some details about how they are estimated. Parker, Vannest, & Davis (2011) provides a review of the non-overlap measures, including worked examples of the calculations. Pustejovsky (2019) provides a critical review of non-overlap measures and parametric effect sizes. However, neither of these reviews include details about standard error calculations.
All of the within-case effect size measures are defined in terms of a comparison of observations between two phases (call them phase A and phase B) within a single-case design. Let and denote the number of observations in phase A and phase B, respectively. Let denote the observations from phase A and denote the observations from phase B.
The non-overlap effect size measures are all defined in terms of ordinal comparisons between data points from phase A and data points from phase B. It will therefore be helpful to have notation for the data points from each phase, sorted in rank order. Thus, let denote the values of the baseline phase data, sorted in increasing order, and let denote the values of the sorted treatment phase data.
The parametric effect size measures are all defined under a simple model for the data-generating process, in which observations in phase A are sampled from a distribution with constant mean and standard deviation , while observations in phase B are sampled from a distribution with constant mean and standard deviation . Let and denote the sample means for phase A and phase B, respectively. Let and denote the sample standard deviations for phase A and phase B, respectively. Let denote the critical value from a standard normal distribution. Finally, we use to denote the natural logarithm function.
Parker & Vannest (2009) proposed non-overlap of all pairs (NAP) as an effect size index for use in single-case research. NAP is defined in terms of all pair-wise comparisons between the data points in two different phases for a given case (i.e., a treatment phase versus a baseline phase). For an outcome that is desirable to increase, NAP is the proportion of all such pair-wise comparisons where the treatment phase observation exceeds the baseline phase observation, with pairs that are exactly tied getting a weight of 1/2. NAP is exactly equivalent to the modified Common Language Effect Size (Vargha & Delaney, 2000) and has been proposed as an effect size index in other contexts too (e.g., Acion, Peterson, Temple, & Arndt, 2006).
NAP can be interpreted as an estimate of the probability that a randomly selected observation from the B phase improves upon a randomly selected observation from the A phase. For an outcome where increase is desirable, the effect size parameter is
For an outcome where decrease is desirable, the effect size parameter is
For an outcome where increase is desirable, calculate
for and . For an outcome where decrease is desirable, one would instead use
The NAP effect size index is then calculated as
The SingleCaseES package provides several different methods for estimating the standard error of NAP. The default method is calculated based on the exactly unbiased variance estimator described by Sen (1967; cf. Mee, 1990), which assumes that the observations are mutually independent and are identically distributed within each phase. Let
The SE is then calculated as
Another method for estimating a standard error was introduced by Hanley & McNeil (1982). This standard error is calculated as
with and defined as above. This standard error is based on the same assumptions as the unbiased SE.
A limitation of and is that they will be equal to zero when there is complete non-overlap (i.e., when is equal to zero or equal to one). In order to ensure a strictly positive standard error for NAP, the SingleCaseES package calculates and using a truncation of NAP. Specifically, the formulas are evaluated using
in place of .
A final method for estimating a standard error is to work under the null hypothesis that there is no effect—i.e., that the data points from each phase are sampled from the same distribution. Under the null hypothesis, the sampling variance of depends only on the number of observations in each phase:
(cf. Grissom & Kim, 2001, p. 141). If null hypothesis is not true—that is, if the observations in phase B are drawn from a different distribution than the observations in phase A—then this standard error will tend to be too large.
A confidence interval for can be calculated using a method proposed by Newcombe [Newcombe (2006); Method 5], which assumes that the observations are mutually independent and are identically distributed within each phase. Using a confidence level of , the endpoints of the confidence interval are defined as the values of that satisfy the equality
where and is critical value from a standard normal distribution. This equation is a fourth-degree polynomial in , solved using a numerical root-finding algorithm.
Scruggs, Mastropieri, & Casto (1987) proposed the percentage of non-overlapping data (PND) as an effect size index for single-case designs. For an outcome where increase is desirable, PND is defined as the proportion of observations in the B phase that exceed the highest observation from the A phase. For an outcome where decrease is desirable, PND is the proportion of observations in the B phase that are less than the lowest observation from the A phase.
This effect size does not have a stable parameter definition because the magnitude of the maximum (or minimum) value from phase A depends on the number of observations in the phase (Allison & Gorman, 1994; Pustejovsky, 2019).
Ma (2006) proposed the percent exceeding the median, defined as the proportion of observations in phase B that improve upon the median of phase A. Ma (2006) did not specify an effect size parameter corresponding to this index. However, it would be reasonable to define the parameter as the probability that a randomly selected observation from the B phase represents an improvement over the median of the distribution of A phase outcomes. Let denote the median of the distribution of outcomes in phase A. For an outcome where increase is desirable, the PEM parameter would then be
For an outcome where decrease is desirable, it would be
For an outcome where increase (decrease) is desirable, Parker, Vannest, & Davis (2011) defined PAND as the proportion of observations remaining after removing the fewest possible number of observations from either phase so that the highest remaining point from the baseline phase is less than the lowest remaining point from the treatment phase (lowest remaining point from the baseline phase is larger than the highest remaining point from the treatment phase).
This effect size does not have a stable parameter definition because its magnitude depends on the number of observations in each phase (Pustejovsky, 2019).
For an outcome where increase is desirable, PAND is calculated as
where , , and the maximum is taken over the values and . For an outcome where decrease is desirable, PAND is calculated as
where , , and the maximum is taken over the values and .
The sampling distribution of PAND has not been described, and so standard errors and confidence intervals are not available.
The robust improvement rate difference is defined as the robust phi coefficient corresponding to a certain table that is a function of the degree of overlap between the observations each phase (Parker, Vannest, & Davis, 2011). This effect size does not have a stable parameter definition because its magnitude depends on the number of observations in each phase (Pustejovsky, 2019).
For notational convenience, let and . For an outcome where increase is desirable, let and denote the values that maximize the quantity
for and . For an outcome where decrease is desirable, let and instead denote the values that maximize the quantity
Now calculate the table
$$ \begin{array}{|c|c|} \hline m - \tilde{i} & \tilde{j} \\ \hline \tilde{i} & n - \tilde{j} \\ \hline \end{array} $$
Parker, Vannest, & Brown (2009) proposed the non-robust improvement rate difference, which is equivalent to the phi coefficient from this table. Parker, Vannest, & Davis (2011) proposed to instead use the robust phi coefficient, which involves modifying the table so that the row- and column-margins are equal. Robust IRD is thus equal to
Robust IRD is algebraically related to PAND as
Just as with PAND, the sampling distribution of robust IRD has not been described, and so standard errors and confidence intervals are not available.
Tau is one of several effect sizes proposed by Parker, Vannest, Davis, & Sauber (2011) and known collectively as “Tau-U.” The basic estimator Tau does not make any adjustments for time trends. For an outcome where increase is desirable, the effect size parameter is
(for an outcome where decrease is desirable, the effect size parameter would have the opposite sign). This parameter is a simple linear transformation of the NAP parameter :
For an outcome where increase is desirable, calculate
For an outcome where decrease is desirable, one would instead use
The Tau effect size index is then calculated as
Standard errors and confidence intervals for Tau are calculated using transformations of the corresponding SEs and CIs for NAP. All of the methods assume that the observations are mutually independent and are identically distributed within each phase.
Standard errors for Tau are calculated as , where is the standard error for NAP calculated based on one of the available methods (unbiased, Hanley, or null).
The CI for is calculated as
where and are the lower and upper bounds of the CI for , calculated using a method proposed by Newcombe (2006).
Tau-U is one of several effect sizes proposed by Parker, Vannest, Davis, & Sauber (2011). The Tau-U variant is similar to Tau, but includes an adjustment term that is a function of the baseline time trend. For an outcome where increase is desirable, the index is calculated as Kendall’s statistic for the comparison between the phase B data and the phase A data, plus Kendall’s statistic for the A phase observations, scaled by the product of the number of observations in each phase.
This effect size does not have a stable parameter definition and its feasible range depends on the number of observations in each phase (Tarlow, 2017).
For an outcome where increase is desirable, calculate
and
For an outcome where decrease is desirable, one would instead use
and
The Tau-U effect size index is then calculated as
The sampling distribution of Tau-U has not been described, and so standard errors and confidence intervals are not available.
Tarlow (2017) proposed to modify the Tau effect size index by first adjusting the observations for a linear trend in the A phase. The index can be calculated with or without conducting a pre-test for significance of the A phase time trend. We provide two approaches to calculate Tau no matter whether the baseline trend is significant or not. The first approach is using Kendall’s rank correlation (with adjustment for ties), as used in Tarlow (2017). The second one is using Tau (non-overlap) index (without adjustment for ties).
If the pre-test for A phase time trend is used, then slope of the baseline trend is first tested using Kendall’s rank correlation. If the baseline slope is significantly different from zero, the outcomes are adjusted for baseline trend using Theil-Sen regression, and the residuals from Theil-Sen regression are used to calculate the Kendall’s rank correlation or Tau (non-overlap) index. If the baseline slope is not significantly different from zero, then no baseline trend adjustment is made, and the Tau-BC effect size is calculated using Kendall’s rank correlation or Tau (non-overlap) index.
If the pre-test for A phase time trend is not used, then the outcomes are adjusted for baseline trend using Theil-Sen regression, regardless of whether the slope is significantly different from zero. The residuals from Theil-Sen regression are then used to calculate the Kendall’s rank correlation or Tau (non-overlap) index.
The formal definition of Tau-BC require positing a model for the time trend in the data series. Thus, suppose that the outcomes can be expressed in terms of a linear time trend and an error term:
Within each phase, assume that the error terms are independent and share a common distribution. The Tau-BC parameter can then be defined as the Tau parameter for the distribution of the error terms, or
An equivalent definition in terms of the outcome distributions is
for and .
Estimation of entails correcting the data series for the baseline slope . If using the baseline trend pre-test, the null hypothesis of is first tested using Kendall’s rank correlation. If the test is not significant, then set and . If the test is significant or if the pre-test for baseline time trend is not used, then the slope is estimated by Theil-Sen regression. Specifically, we calculate the slope between every pair of observations in the A phase: for and . The overall slope estimate is taken to be the median over all slope pairs:
The intercept term is estimated by taking the median observation in the A phase after correcting for the estimated linear time trend:
However, the intercept estimate is irrelevant for purposes of estimating Tau-BC because the Tau estimator is a function of ranks and is invariant to a linear shift of the data series.
After estimating the phase A time trend, is estimated by de-trending the full data series and calculating Kendall’s rank correlation or Tau (non-overlap) on the de-trended observations. Specifically, set for and . For an outcome where increase is desirable, calculate
or, for an outcome where decrease is desirable, calculate
Tau-BC (non-overlap) is then estimated by
If calculated with Kendall’s rank correlation, Tau-BC is estimated as the rank correlation between and a dummy coded variable , with an adjustment for ties (Kendall, 1970, p. 35). Specifically,
where
and is the number of ties between all possible pairs of observations (including pairs within phase A, pairs within phase B, and pairs of one phase A and one phase B data point). can be computed as
We prefer and recommend to use the Tau-AB form, which divides by rather than by , because it leads to a simpler interpretation. Furthermore, using means that may be sensitive to variation in phase lengths. To see this sensitivity, consider a scenario where there are no tied values and so every value is unique. In this case, and Thus, the denominator will always be larger than , meaning that will always be smaller than . Further, the largest and smallest possible values of will be , or about when and are close to equal. In contrast, the largest and smallest possible values of are always -1 and 1, respectively.
The exact sampling distribution of (Kendall, adjusted for ties) has not been described. Tarlow (2017) proposed to approximate its sampling variance using arguing that this would generally be conservative (in the sense of over-estimating the true sampling error). When Tau-BC is calculated using Kendall’s rank correlation, the SingleCaseES package reports a standard error based on this approximation.
When calculated without adjustment for ties, the SingleCaseES package takes a different approach for estimating the standard error for (non-overlap), reporting approximate standard errors and confidence intervals for based on the methods described above for (non-overlap, without baseline trend correction). An important limitation of this approach is that it does not account for the uncertainty introduced by estimating the phase A time trend (i.e., the uncertainty in ).
Gingerich (1984) and Busk & Serlin (1992) proposed a within-case standardized mean difference for use in single-case designs (within-case because it is based on the data for a single individual, rather than across individuals). The standardized mean difference parameter is defined as the difference in means between phase B and phase A, scaled by the standard deviation of the phase A outcome distribution:
Note that represents within-individual variability only. In contrast, the SMD applied to a between-groups design involves scaling by a measure of between- and within-individual variability. Thus, the scale of the within-case SMD is not comparable to the scale of the SMD from a between-groups design.
The SMD can be estimated under the assumption that the observations are mutually independent and have constant variance within each phase. There are two ways that the SMD, depending on whether it is reasonable to assume that the standard deviation of the outcome is constant across phases (i.e., ).
Gingerich (1984) and Busk & Serlin (1992) originally suggested scaling by the SD from phase A only, due to the possibility of non-constant variance across phases. Without assuming constant SDs, an estimate of the standardized mean difference is
The term in parentheses is a small-sample bias correction term (cf. Hedges, 1981; Pustejovsky, 2019). The standard error of this estimate is calculated as
The log-response ratio (LRR) is an effect size index that quantifies the change from phase A to phase B in proportionate terms. Pustejovsky (2015) proposed to use it as an effect size index for single-case designs (see also Pustejovsky, 2018). The LRR is appropriate for use with outcomes on a ratio scale—that is, where zero indicates the total absence of the outcome. The LRR parameter is defined as
The logarithm is used so that the range of the index is less restricted.
There are two variants of the LRR (Pustejovsky, 2018), corresponding to whether therapeutic improvements correspond to negative values of the index (LRR-decreasing or LRRd) or positive values of the index (LRR-increasing or LRRi). For outcomes measured as frequency counts or rates, LRRd and LRRi are identical in magnitude but have opposite sign. However, for outcomes measured as proportions (ranging from 0 to 1) or percentages (ranging from 0% to 100%), LRRd and LRRi will differ in both sign and magnitude because the outcomes are first transformed to be consistent with the selected direction of therapeutic improvement.
To account for the possibility that the sample means may be equal to zero, even if the mean levels are strictly greater than zero, the LRR is calculated using truncated sample means, given by where is a constant that depends on the scale and recording procedure used to measure the outcomes (Pustejovsky, 2018). To ensure that the standard error of LRR is strictly positive, it is calculated using truncated sample variances, given by
A basic estimator of the LRR is then given by
However, will be biased when one or both phases include only a small number of observations. A bias-corrected estimator is given by
The bias-corrected estimator is the default option in SingleCaseES.
The log-odds ratio is an effect size index that quantifies the change from phase A to phase B in terms of proportionate change in the odds that a behavior is occurring (Pustejovsky, 2015). It is appropriate for use with outcomes on a percentage or proportion scale. The LOR parameter is defined as
where the outcomes are measured in proportions. The log odds ratio ranges from to , with a value of zero corresponding to no change in mean levels.
To account for the possibility that the sample means may be equal to zero or one, even if the mean levels are strictly between zero and one, the LOR is calculated using truncated sample means, given by
and
where is a constant that depends on the scale and recording procedure used to measure the outcomes (Pustejovsky, 2018).To ensure that the corresponding standard error is strictly positive, it is calculated using truncated sample variances, given by
A basic estimator of the LOR is given by
However, like the LRR, this estimator will be biased when the one or both phases include only a small number of observations. A bias-corrected estimator of the LOR is given by
This estimator uses a small-sample correction to reduce bias when one or both phases include only a small number of observations.
Bonett & Price (2020b) described the log ratio of medians (LRM) effect size, which can be used to quantify the change in medians from phase A to phase B. The LRM is the natural logarithm of the ratio of medians. This effect size is appropriate for outcomes that are skewed or right-censored (Bonett & Price, 2020b). For an outcome where increase is desirable, the LRM parameter is defined as
where and are the population medians for phase B and phase A, respectively. For an outcome where decrease is desirable, the LRM parameter has the opposite sign:
A natural estimator of the is given by
where and are the sample medians for phase B and phase A, respectively. Note that the sample median might be zero for either phase B and phase A in some single-case design data, resulting in infinite LRM.
Standard errors and confidence intervals for LRM can be obtained under the assumption that the outcome data within each phase are mutually independent and follow a common distribution. Using the fact that the logarithm of the median is the same or close to the median of the log-transformed outcomes, the standard error for can be calculated using the order statistics within each phase (Bonett & Price, 2020a). Let and find The standard error of LRM is then (Bonett & Price, 2020a) where are the and order statistics of the phase A outcomes and are the and order statistics of the phase B outcomes.
Ferron, Goldstein, Olszewski, & Rohrer (2020) proposed a percent of goal obtained (PoGO) effect size metric for use in single-case designs. Let denote the goal level of behavior, which must be specified by the analyst or researcher. Percent of goal obtained quantifies the change in the mean level of behavior relative to the goal. The PoGO parameter is defined as:
Approaches for estimation of PoGO depend on one’s assumption about the stability of the observations in phases A and B. Under the assumption that the observations are temporally stable, a natural estimator of PoGO is
Patrona, Ferron, Olszewski, Kelley, &
Goldstein (2022) proposed a
method for calculating a standard error for the PoGO estimator under the
assumptions that the observations within each phase are mutually
independent. The standard error uses an approximation for the standard
error of two independent, normally distributed random variables due to
Dunlap & Silver (1986). It is calculated as
Patrona et al.
(2022) also provided a more
general approximation, which can be applied when PoGO is estimated using
regressions that control for time trends or auto-correlation. However,
these methods are not implemented in SingleCaseES
.
An approximate confidence interval for is given by where is the critical value from a standard normal distribution (Patrona et al., 2022).