`Effect-size-definitions.Rmd`

The SingleCaseES package provides R functions for calculating basic, within-case effect size indices for single-case designs, including several non-overlap measures and parametric effect size measures, and for estimating the gradual effects model developed by Swan & Pustejovsky (2018). Estimation procedures for standard errors and confidence intervals are provided for the subset of effect sizes indices with known sampling distributions. This vignette covers the mathematical definitions of the basic non-overlap and parametric effect size measures, along with some details about how they are estimated. Parker, Vannest, & Davis (2011) provides a review of the non-overlap measures, including worked examples of the calculations. Pustejovsky (2019) provides a critical review of non-overlap measures and parametric effect sizes. However, neither of these reviews include details about standard error calculations.

All of the within-case effect size measures are defined in terms of a comparison of observations between two phases (call them phase A and phase B) within a single-case design. Let \(m\) and \(n\) denote the number of observations in phase A and phase B, respectively. Let \(y^A_1,...,y^A_m\) denote the observations from phase A and \(y^B_1,...,y^B_n\) denote the observations from phase B.

The non-overlap effect size measures are all defined in terms of ordinal comparisons between data points from phase A and data points from phase B. It will therefore be helpful to have notation for the data points from each phase, sorted in rank order. Thus, let \(y^A_{(1)},y^A_{(2)},...,y^A_{(m)}\) denote the values of the baseline phase data, sorted in increasing order, and let \(y^B_{(1)},y^B_{(2)},...,y^B_{(n)}\) denote the values of the sorted treatment phase data.

The parametric effect size measures are all defined under a simple model for the data-generating process, in which observations in phase A are sampled from a distribution with constant mean \(\mu_A\) and standard deviation \(\sigma_A\), while observations in phase B are sampled from a distribution with constant mean \(\mu_B\) and standard deviation \(\sigma_B\). Let \(\bar{y}_A\) and \(\bar{y}_B\) denote the sample means for phase A and phase B, respectively. Let \(s_A\) and \(s_B\) denote the sample standard deviations for phase A and phase B, respectively. Let \(z_{\alpha / 2}\) denote the \(1 - \alpha / 2\) critical value from a standard normal distribution. Finally, we use \(\ln()\) to denote the natural logarithm function.

Parker & Vannest (2009) proposed non-overlap of all pairs (NAP) as an effect size index for use in single-case research. NAP is defined in terms of all pair-wise comparisons between the data points in two different phases for a given case (i.e., a treatment phase versus a baseline phase). For an outcome that is desirable to increase, NAP is the proportion of all such pair-wise comparisons where the treatment phase observation exceeds the baseline phase observation, with pairs that are exactly tied getting a weight of 1/2. NAP is exactly equivalent to the modified Common Language Effect Size (Vargha & Delaney, 2000) and has been proposed as an effect size index in other contexts too (e.g., Acion, Peterson, Temple, & Arndt, 2006).

NAP can be interpreted as an estimate of the probability that a randomly selected observation from the B phase improves upon a randomly selected observation from the A phase. For an outcome where increase is desirable, the effect size parameter is

\[\theta = \text{Pr}(Y^B > Y^A) + 0.5 \times \text{Pr}(Y^B = Y^A).\]

For an outcome where decrease is desirable, the effect size parameter is

\[\theta = \text{Pr}(Y^B < Y^A) + 0.5 \times \text{Pr}(Y^B = Y^A).\]

For an outcome where increase is desirable, calculate

\[q_{ij} = I(y^B_j > y^A_i) + 0.5 I(y^B_j = y^A_i)\] for \(i = 1,...,m\) and \(j = 1,...,n\). For an outcome where decrease is desirable, one would instead use

\[q_{ij} = I(y^B_j < y^A_i) + 0.5 I(y^B_j = y^A_i).\]

The NAP effect size index is then calculated as

\[ \text{NAP} = \frac{1}{m n} \sum_{i=1}^m \sum_{j=1}^n q_{ij}. \]

The SingleCaseES package provides several different methods for estimating the standard error of NAP. The default method is calculated based on the exactly unbiased variance estimator described by Sen (1967; cf. Mee, 1990), which assumes that the observations are mutually independent and are identically distributed within each phase. Let

\[ \begin{aligned} Q_1 &= \frac{1}{m n^2} \sum_{i=1}^m \left[\sum_{j=1}^n \left(q_{ij} - \text{NAP}\right)\right]^2, \\ Q_2 &= \frac{1}{m^2 n} \sum_{j=1}^n \left[\sum_{i=1}^m \left(q_{ij} - \text{NAP}\right)\right]^2, \qquad \text{and} \\ Q_3 &= \frac{1}{m n} \sum_{i=1}^m \sum_{j=1}^n \left(q_{ij} - \text{NAP}\right)^2. \end{aligned} \]

The SE is then calculated as

\[ SE_{\text{unbiased}} = \sqrt{\frac{\text{NAP}(1 - \text{NAP}) + n Q_1 + m Q_2 - 2 Q_3}{(m - 1)(n - 1)}}. \]

Another method for estimating a standard error was introduced by Hanley & McNeil (1982). This standard error is calculated as

\[ SE_{\text{Hanley}} = \sqrt{\frac{1}{mn}\left(\text{NAP}(1 - \text{NAP}) + (n - 1)Q_1 + (m - 1)Q_2\right)}, \]

with \(Q_1\) and \(Q_2\) defined as above. This standard error is based on the same assumptions as the unbiased SE.

A limitation of \(SE_{unbiased}\) and \(SE_{Hanley}\) is that they will be equal to zero when there is complete non-overlap (i.e., when \(\text{NAP}\) is equal to zero or equal to one). In order to ensure a strictly positive standard error for NAP, the SingleCaseES package calculates \(SE_{unbiased}\) and \(SE_{Hanley}\) using a truncation of NAP. Specifically, the formulas are evaluated using

\[ \widetilde{\text{NAP}} = \text{max}\left\{\frac{1}{2 mn}, \ \text{min}\left\{\frac{2mn - 1}{2mn}, \ \text{NAP} \right\} \right\} \] in place of \(\text{NAP}\).

A final method for estimating a standard error is to work under the null hypothesis that there is no effect—i.e., that the data points from each phase are sampled from the same distribution. Under the null hypothesis, the sampling variance of \(\text{NAP}\) depends only on the number of observations in each phase:

\[ SE_{\text{null}} = \sqrt{\frac{m + n + 1}{12 m n}} \] (cf. Grissom & Kim, 2001, p. 141). If null hypothesis is not true—that is, if the observations in phase B are drawn from a different distribution than the observations in phase A—then this standard error will tend to be too large.

A confidence interval for \(\theta\) can be calculated using a method proposed by Newcombe [Newcombe (2006); Method 5], which assumes that the observations are mutually independent and are identically distributed within each phase. Using a confidence level of \(100\% \times (1 - \alpha)\), the endpoints of the confidence interval are defined as the values of \(\theta\) that satisfy the equality

\[ (\text{NAP} - \theta)^2 = \frac{z^2_{\alpha / 2} h \theta (1 - \theta)}{mn}\left[\frac{1}{h} + \frac{1 - \theta}{2 - \theta} + \frac{\theta}{1 + \theta}\right], \]

where \(h = (m + n) / 2 - 1\) and \(z_{\alpha / 2}\) is \(1 - \alpha / 2\) critical value from a standard normal distribution. This equation is a fourth-degree polynomial in \(\theta\), solved using a numerical root-finding algorithm.

Scruggs, Mastropieri, & Casto (1987) proposed the percentage of non-overlapping data (PND) as an effect size index for single-case designs. For an outcome where increase is desirable, PND is defined as the proportion of observations in the B phase that exceed the highest observation from the A phase. For an outcome where decrease is desirable, PND is the proportion of observations in the B phase that are less than the lowest observation from the A phase.

This effect size does not have a stable parameter definition because the magnitude of the maximum (or minimum) value from phase A depends on the number of observations in the phase (Allison & Gorman, 1994; Pustejovsky, 2019).

For an outcome where increase is desirable,

\[ \text{PND} = \frac{1}{n} \sum_{j=1}^n I(y^B_j > y^A_{(m)}), \]

where \(y^A_{(m)}\) is the maximum value of \(y^A_1,...,y^A_m\). For an outcome where decrease is desirable,

\[ \text{PND} = \frac{1}{n} \sum_{j=1}^n I(y^B_j < y^A_{(1)}), \]

where \(y^A_{(1)}\) is the minimum value of \(y^A_1,...,y^A_m\).

The sampling distribution of PND has not been described, and so standard errors and confidence intervals are not available.

Ma (2006) proposed the percent exceeding the median, defined as the proportion of observations in phase B that improve upon the median of phase A. Ma (2006) did not specify an effect size parameter corresponding to this index. However, it would be reasonable to define the parameter as the probability that a randomly selected observation from the B phase represents an improvement over the median of the distribution of A phase outcomes. Let \(\eta_A\) denote the median of the distribution of outcomes in phase A. For an outcome where increase is desirable, the PEM parameter would then be

\[ \xi = \text{Pr}\left(Y_B > \eta_A\right) + 0.5 \times \text{Pr}\left(Y_B = \eta_A\right). \] For an outcome where decrease is desirable, it would be \[ \xi = \text{Pr}\left(Y_B < \eta_A\right) + 0.5 \times \text{Pr}\left(Y_B = \eta_A\right). \]

For an outcome where increase is desirable,

\[ \text{PEM} = \frac{1}{n}\sum_{j=1}^n \left[ I(y^B_j > m_A) + 0.5 \times I(y^B_j = m_A) \right], \]

where \(m_A = \text{median}(y^A_1,...,y^A_m)\). For an outcome where decrease is desirable,

\[ \text{PEM} = \frac{1}{n}\sum_{j=1}^n \left[ I(y^B_j < y^A_{(1)}) + 0.5 \times I(y^B_j = m_A) \right]. \]

The sampling distribution of PEM has not been described, and so standard errors and confidence intervals are not available.

For an outcome where increase (decrease) is desirable, Parker et al. (2011) defined PAND as the proportion of observations remaining after removing the fewest possible number of observations from either phase so that the highest remaining point from the baseline phase is less than the lowest remaining point from the treatment phase (lowest remaining point from the baseline phase is larger than the highest remaining point from the treatment phase).

This effect size does not have a stable parameter definition because its magnitude depends on the number of observations in each phase (Pustejovsky, 2019).

For an outcome where increase is desirable, PAND is calculated as

\[ \text{PAND} = \frac{1}{m + n} \max \left\{\left(i + j\right) I\left(y^A_{(i)} < y^B_{(n + 1 - j)}\right)\right\}, \]

where \(y^A_{(0)} = - \infty\), \(y^B_{(n + 1)} = \infty\), and the maximum is taken over the values \(0 \leq i \leq m\) and \(0 \leq j \leq n\). For an outcome where decrease is desirable, PAND is calculated as

\[ \text{PAND} = \frac{1}{m + n} \max \left\{\left(i + j\right) I\left(y^A_{(m + 1 - i)} > y^B_{(j)}\right)\right\}, \]

where \(y^A_{(m + 1)} = \infty\), \(y^B_{(0)} = -\infty\), and the maximum is taken over the values \(0 \leq i \leq m\) and \(0 \leq j \leq n\).

The sampling distribution of PAND has not been described, and so standard errors and confidence intervals are not available.

The robust improvement rate difference is defined as the robust phi coefficient corresponding to a certain \(2 \times 2\) table that is a function of the degree of overlap between the observations each phase (Parker et al., 2011). This effect size does not have a stable parameter definition because its magnitude depends on the number of observations in each phase (Pustejovsky, 2019).

For notational convenience, let \(y^A_{(0)} = y^B_{(0)} = -\infty\) and \(y^A_{(m + 1)} = y^B_{(n + 1)} = \infty\). For an outcome where increase is desirable, let \(\tilde{i}\) and \(\tilde{j}\) denote the values that maximize the quantity

\[ \left(i + j\right) I\left(y^A_{(i)} < y^B_{(n + 1 - j)}\right) \] for \(0 \leq i \leq m\) and \(0 \leq j \leq n\). For an outcome where decrease is desirable, let \(\tilde{i}\) and \(\tilde{j}\) instead denote the values that maximize the quantity

\[ \left(i + j\right) I\left(y^A_{(m + 1 - i)} > y^B_{(j)}\right). \]

Now calculate the \(2 \times 2\) table

\[ \begin{array}{|c|c|} \hline m - \tilde{i} & \tilde{j} \\ \hline \tilde{i} & n - \tilde{j} \\ \hline \end{array} \]

Parker, Vannest, & Brown (2009) proposed the *non-robust* improvement rate difference, which is equivalent to the phi coefficient from this table. Parker et al. (2011) proposed to instead use the *robust* phi coefficient, which involves modifying the table so that the row- and column-margins are equal. Robust IRD is thus equal to

\[ \text{IRD} = \frac{n - m - \tilde{i} - \tilde{j}}{2 n} - \frac{m + n - \tilde{i} - \tilde{j}}{2 m}. \]

Robust IRD is algebraically related to PAND as

\[ \text{IRD} = 1 - \frac{(m + n)^2}{2mn}\left(1 - \text{PAND}\right). \] Just as with PAND, the sampling distribution of robust IRD has not been described, and so standard errors and confidence intervals are not available.

Tau is one of several effect sizes proposed by Parker, Vannest, Davis, & Sauber (2011) and known collectively as “Tau-U.” The basic estimator Tau does not make any adjustments for time trends. For an outcome where increase is desirable, the effect size parameter is

\[\tau = \text{Pr}(Y^B > Y^A) - \text{Pr}(Y^B < Y^A)\]

(for an outcome where decrease is desirable, the effect size parameter would have the opposite sign). This parameter is a simple linear transformation of the NAP parameter \(\theta\):

\[\tau = 2 \theta - 1.\]

For an outcome where increase is desirable, calculate

\[w_{ij} = I(y^B_j > y^A_i) - I(y^B_j < y^A_i)\]

For an outcome where decrease is desirable, one would instead use

\[w_{ij} = I(y^B_j < y^A_i) - I(y^B_j > y^A_i).\]

The Tau effect size index is then calculated as

\[ \text{Tau} = \frac{1}{m n} \sum_{i=1}^m \sum_{j=1}^n w_{ij} = 2 \times \text{NAP} - 1. \]

Standard errors and confidence intervals for Tau are calculated using transformations of the corresponding SEs and CIs for NAP. All of the methods assume that the observations are mutually independent and are identically distributed within each phase.

Standard errors for Tau are calculated as \(SE_{\text{Tau}} = 2 SE_{\text{NAP}}\), where \(SE_{\text{NAP}}\) is the standard error for NAP calculated based on one of the available methods (unbiased, Hanley, or null).

The CI for \(\tau\) is calculated as

\[ [L_{\tau}, U_{\tau}] = [2 L_{\theta} - 1, 2 U_{\theta} - 1], \]

where \(L_{\theta}\) and \(U_{\theta}\) are the lower and upper bounds of the CI for \(\theta\), calculated using a method proposed by Newcombe (2006).

Tau-U is one of several effect sizes proposed by Parker et al. (2011). The Tau-U variant is similar to Tau, but includes an adjustment term that is a function of the baseline time trend. For an outcome where increase is desirable, the index is calculated as Kendall’s \(S\) statistic for the comparison between the phase B data and the phase A data, plus Kendall’s \(S\) statistic for the A phase observations, scaled by the product of the number of observations in each phase.

This effect size does not have a stable parameter definition and its feasible range depends on the number of observations in each phase (Tarlow, 2017).

For an outcome where increase is desirable, calculate

\[w^{AB}_{ij} = I(y^B_j > y^A_i) - I(y^B_j < y^A_i)\]

and

\[w^{AA}_{ij} = I(y^A_j > y^A_i) - I(y^A_j < y^A_i)\]

For an outcome where decrease is desirable, one would instead use

\[w^{AB}_{ij} = I(y^B_j < y^A_i) - I(y^B_j > y^A_i)\]

and

\[w^{AA}_{ij} = I(y^A_j < y^A_i) - I(y^A_j > y^A_i).\]

The Tau-U effect size index is then calculated as

\[ \text{Tau-U} = \frac{1}{m n} \left(\sum_{i=1}^m \sum_{j=1}^n w^{AB}_{ij} - \sum_{i=1}^{m - 1} \sum_{j=i + 1}^m w^{AA}_{ij}\right). \]

The sampling distribution of Tau-U has not been described, and so standard errors and confidence intervals are not available.

Tarlow (2017) proposed to modify the Tau effect size index by first adjusting the observations for a linear trend in the A phase. The index can be calculated with or without conducting a pre-test for significance of the A phase time trend. We provide two approaches to calculate Tau no matter whether the baseline trend is significant or not. The first approach is using Kendall’s rank correlation (with adjustment for ties), as used in Tarlow (2017). The second one is using Tau (non-overlap) index (without adjustment for ties).

If the pre-test for A phase time trend is used, then slope of the baseline trend is first tested using Kendall’s rank correlation. If the baseline slope is significantly different from zero, the outcomes are adjusted for baseline trend using Theil-Sen regression, and the residuals from Theil-Sen regression are used to calculate the Kendall’s rank correlation or Tau (non-overlap) index. If the baseline slope is not significantly different from zero, then no baseline trend adjustment is made, and the Tau-BC effect size is calculated using Kendall’s rank correlation or Tau (non-overlap) index.

If the pre-test for A phase time trend is not used, then the outcomes are adjusted for baseline trend using Theil-Sen regression, regardless of whether the slope is significantly different from zero. The residuals from Theil-Sen regression are then used to calculate the Kendall’s rank correlation or Tau (non-overlap) index.

The formal definition of Tau-BC require positing a model for the time trend in the data series. Thus, suppose that the outcomes can be expressed in terms of a linear time trend and an error term:

\[ \begin{aligned} y_i^A &= \alpha + \beta (i) + \epsilon_i^A, \quad \text{for} \quad i = 1,...,m \\ y_j^B &= \alpha + \beta (m + j) + \epsilon_j^B \quad \text{for} \quad j = 1,...,n. \end{aligned} \] Within each phase, assume that the error terms are independent and share a common distribution. The Tau-BC parameter can then be defined as the Tau parameter for the distribution of the error terms, or

\[ \tau_{BC} = \text{Pr}(\epsilon^B > \epsilon^A) - \text{Pr}(\epsilon^B < \epsilon^A). \] An equivalent definition in terms of the outcome distributions is

\[ \tau_{BC} = \text{Pr}\left[Y_j^B - \beta (m + j - i) > Y_i^A \right] - \text{Pr}\left[Y_j^B - \beta (m + j - i) < Y_i^A\right] \] for \(i=1,...,m\) and \(j = 1,...,n\).

Estimation of \(\tau_{BC}\) entails correcting the data series for the baseline slope \(\beta\). If using the baseline trend pre-test, the null hypothesis of \(H_0: \beta = 0\) is first tested using Kendall’s rank correlation. If the test is not significant, then set \(\hat\beta = 0\) and \(\hat\alpha = 0\). If the test is significant or if the pre-test for baseline time trend is not used, then the slope is estimated by Theil-Sen regression. Specifically, we calculate the slope between every pair of observations in the A phase: \[ s_{hi} = \frac{y_i^A - y_h^A}{i - h} \] for \(i = 1,...,m - 1\) and \(h = i+1,...,m\). The overall slope estimate is taken to be the median over all \(m(m - 1) / 2\) slope pairs:

\[ \hat\beta = \text{median}\left\{s_{12},...,s_{(m-1)m}\right\}. \] The intercept term is estimated by taking the median observation in the A phase after correcting for the estimated linear time trend:

\[ \hat\alpha = \text{median}\left\{y_1^A - \hat\beta \times 1, \ y_2^A - \hat\beta \times 2, ..., \ y_m^A - \hat\beta \times m\right\}. \] However, the intercept estimate is irrelevant for purposes of estimating Tau-BC because the Tau estimator is a function of ranks and is invariant to a linear shift of the data series.

After estimating the phase A time trend, \(\tau_{BC}\) is estimated by de-trending the full data series and calculating Kendall’s rank correlation or Tau (non-overlap) on the de-trended observations. Specifically, set \(\hat\epsilon_i^A = y_i^A - \hat\beta (i) - \hat\alpha\) for \(i = 1,...,m\) and \(\hat\epsilon_j^B = y_j^B - \hat\beta (m + j) - \hat\alpha\). For an outcome where increase is desirable, calculate \[w^\epsilon_{ij} = I\left(\hat\epsilon^B_j > \hat\epsilon^A_i\right) - I\left(\hat\epsilon^B_j < \hat\epsilon^A_i\right)\]

or, for an outcome where decrease is desirable, calculate \[w^\epsilon_{ij} = I\left(\hat\epsilon^B_j < \hat\epsilon^A_i\right) - I\left(\hat\epsilon^B_j > \hat\epsilon^A_i\right).\]

Tau-BC (non-overlap) is then estimated by \[ \text{Tau}_{BC} = \frac{1}{m n} \sum_{i=1}^m \sum_{j=1}^n w^\epsilon_{ij}. \]

If calculated with Kendall’s rank correlation, Tau-BC is estimated as the rank correlation between \(\left\{\hat\epsilon^A_1, \dots, \hat\epsilon^A_m, \hat\epsilon^B_1, \dots, \hat\epsilon^B_n \right\}\) and a dummy coded variable \(\left\{0_1,\dots,0_m, 1_1,\dots,1_n \right\}\), with an adjustment for ties (Kendall, 1970, p. 35). Specifically,

\[ \text{Tau}_{BC}^* = \frac{1}{D} \sum_{i=1}^m \sum_{j=1}^n w^\epsilon_{ij}, \] where

\[ D = \sqrt{m \times n \times \left(\frac{(m+n)(m+n-1)}{2}-U\right)} \] and \(U\) is the number of ties between all possible pairs of observations (including pairs within phase A, pairs within phase B, and pairs of one phase A and one phase B data point). \(U\) can be computed as

\[ U = \sum_{i=1}^{m - 1} \sum_{j = i+1}^m I\left(\hat\epsilon^A_i = \hat\epsilon^A_j\right) + \sum_{i=1}^{n - 1} \sum_{j = i+1}^n I\left(\hat\epsilon^B_i = \hat\epsilon^B_j\right) + \sum_{i=1}^m \sum_{j=1}^n I\left(\hat\epsilon^A_i = \hat\epsilon^B_j\right). \]

We prefer and recommend to use the Tau-AB form, which divides by \(m \times n\) rather than by \(D\), because it leads to a simpler interpretation. Furthermore, using \(D\) means that \(\text{Tau}_{BC}^*\) may be sensitive to variation in phase lengths. To see this sensitivity, consider a scenario where there are no tied values and so every value \(\left\{\hat\epsilon^A_1, \dots, \hat\epsilon^A_m, \hat\epsilon^B_1, \dots, \hat\epsilon^B_n \right\}\) is unique. In this case, \(U = 0\) and \[ D = \sqrt{\frac{1}{2} m n (m + n)(m + n - 1)} = m n \sqrt{1 + \frac{m - 1}{2n} + \frac{n - 1}{2m}}. \] Thus, the denominator will always be larger than \(m n\), meaning that \(\text{Tau}_{BC}^*\) will always be smaller than \(\text{Tau}_{BC}\). Further, the largest and smallest possible values of \(\text{Tau}_{BC}^*\) will be \(\pm m n / D\), or about \(1 / \sqrt{2}\) when \(m\) and \(n\) are close to equal. In contrast, the largest and smallest possible values of \(\text{Tau}_{BC}\) are always -1 and 1, respectively.

The exact sampling distribution of \(\text{Tau}_{BC}^*\) (Kendall, adjusted for ties) has not been described. Tarlow (2017) proposed to approximate its sampling variance using \[ SE_{Kendall} = \sqrt{\frac{2 (1 - \text{Tau}_{BC}^2)}{m + n}}, \] arguing that this would generally be conservative (in the sense of over-estimating the true sampling error). When Tau-BC is calculated using Kendall’s rank correlation, the SingleCaseES package reports a standard error based on this approximation.

When calculated without adjustment for ties, the SingleCaseES package takes a different approach for estimating the standard error for \(\text{Tau}_{BC}\) (non-overlap), reporting approximate standard errors and confidence intervals for \(\text{Tau}_{BC}\) based on the methods described above for \(\text{Tau}\) (non-overlap, without baseline trend correction). An important limitation of this approach is that it does not account for the uncertainty introduced by estimating the phase A time trend (i.e., the uncertainty in \(\hat\beta\)).

Gingerich (1984) and Busk & Serlin (1992) proposed a within-case standardized mean difference for use in single-case designs (within-case because it is based on the data for a single individual, rather than across individuals). The standardized mean difference parameter \(\delta\) is defined as the difference in means between phase B and phase A, scaled by the standard deviation of the phase A outcome distribution:

\[ \delta = \frac{\mu_B - \mu_A}{\sigma_A}. \]

Note that \(\sigma_A\) represents *within-individual* variability only. In contrast, the SMD applied to a between-groups design involves scaling by a measure of between- and within-individual variability. Thus, the scale of the within-case SMD is *not* comparable to the scale of the SMD from a between-groups design.

The SMD \(\delta\) can be estimated under the assumption that the observations are mutually independent and have constant variance within each phase. There are two ways that the SMD, depending on whether it is reasonable to assume that the standard deviation of the outcome is constant across phases (i.e., \(\sigma_A = \sigma_B\)).

Gingerich (1984) and Busk & Serlin (1992) originally suggested scaling by the SD from phase A only, due to the possibility of non-constant variance across phases. Without assuming constant SDs, an estimate of the standardized mean difference is

\[ d_A = \left(1 - \frac{3}{4m - 5}\right) \frac{\bar{y}_B - \bar{y}_A}{s_A}. \]

The term in parentheses is a small-sample bias correction term (cf. Hedges, 1981; Pustejovsky, 2019). The standard error of this estimate is calculated as

\[ SE_{d_A} = \left(1 - \frac{3}{4m - 5}\right)\sqrt{\frac{1}{m} + \frac{s_B^2}{n s_A^2} + \frac{d_A^2}{2(m - 1)}}. \]

If it is reasonable to assume that the SDs are constant across phases, then one can use the pooled sample SD, defined as

\[ s_p = \sqrt{\frac{(m - 1)s_A^2 + (n - 1) s_B^2}{m + n - 2}}. \]

The SMD can then be estimated as

\[ d_p = \left(1 - \frac{3}{4(m + n) - 9}\right) \frac{\bar{y}_B - \bar{y}_A}{s_p}, \]

with approximate standard error

\[ SE_{d_A} = \left(1 - \frac{3}{4(m + n) - 9}\right)\sqrt{\frac{1}{m} + \frac{1}{n} + \frac{d_p^2}{2(m + n - 2)}}. \]

The log-response ratio (LRR) is an effect size index that quantifies the change from phase A to phase B in proportionate terms. Pustejovsky (2015) proposed to use it as an effect size index for single-case designs (see also Pustejovsky, 2018). The LRR is appropriate for use with outcomes on a ratio scale—that is, where zero indicates the total absence of the outcome. The LRR parameter is defined as

\[ \psi = \ln\left(\mu_B / \mu_A\right), \]

The logarithm is used so that the range of the index is less restricted.

There are two variants of the LRR (Pustejovsky, 2018), corresponding to whether therapeutic improvements correspond to negative values of the index (LRR-decreasing or LRRd) or positive values of the index (LRR-increasing or LRRi). For outcomes measured as frequency counts or rates, LRRd and LRRi are identical in magnitude but have opposite sign. However, for outcomes measured as proportions (ranging from 0 to 1) or percentages (ranging from 0% to 100%), LRRd and LRRi will differ in both sign and magnitude because the outcomes are first transformed to be consistent with the selected direction of therapeutic improvement.

To account for the possibility that the sample means may be equal to zero, even if the mean levels are strictly greater than zero, the LRR is calculated using *truncated* sample means, given by \[
\tilde{y}_A = \text{max} \left\{ \bar{y}_A, \ \frac{1}{2 D m}\right\} \qquad \text{and} \qquad \tilde{y}_B = \text{max} \left\{ \bar{y}_B, \ \frac{1}{2 D n}\right\},
\] where \(D\) is a constant that depends on the scale and recording procedure used to measure the outcomes (Pustejovsky, 2018). To ensure that the standard error of LRR is strictly positive, it is calculated using truncated sample variances, given by \[
\tilde{s}_A^2 = \text{max}\left\{s_A^2, \ \frac{1}{D^2 m^3}\right\} \qquad \text{and} \qquad \tilde{s}_B^2 = \text{max} \left\{ s_B^2, \ \frac{1}{D^2 n^3}\right\}.
\]

A basic estimator of the LRR is then given by

\[ R_1 = \ln\left(\tilde{y}_B\right) - \ln\left(\tilde{y}_A\right). \]

However, \(R_1\) will be biased when one or both phases include only a small number of observations. A bias-corrected estimator is given by

\[ R_2 = \ln\left(\tilde{y}_B\right) + \frac{\tilde{s}_B^2}{2 n \tilde{y}_B^2} - \ln\left(\tilde{y}_A\right) - \frac{\tilde{s}_A^2}{2 m \tilde{y}_A^2}. \] The bias-corrected estimator is the default option in SingleCaseES.

The log-odds ratio is an effect size index that quantifies the change from phase A to phase B in terms of proportionate change in the odds that a behavior is occurring (Pustejovsky, 2015). It is appropriate for use with outcomes on a percentage or proportion scale. The LOR parameter is defined as

\[ \psi = \ln\left(\frac{\mu_B/(1-\mu_B)}{\mu_A/(1-\mu_A)}\right), \]

where the outcomes are measured in proportions. The log odds ratio ranges from \(-\infty\) to \(\infty\), with a value of zero corresponding to no change in mean levels.

To account for the possibility that the sample means may be equal to zero or one, even if the mean levels are strictly between zero and one, the LOR is calculated using *truncated* sample means, given by

\[ \tilde{y}_A = \text{max} \left\{ \text{min}\left[\bar{y}_A, 1 - \frac{1}{2 D m}\right], \frac{1}{2 D m}\right\} \] and

\[ \tilde{y}_B = \text{max} \left\{ \text{min}\left[\bar{y}_B, 1 - \frac{1}{2 D n}\right], \frac{1}{2 D n}\right\}, \]

where \(D\) is a constant that depends on the scale and recording procedure used to measure the outcomes (Pustejovsky, 2018).To ensure that the corresponding standard error is strictly positive, it is calculated using truncated sample variances, given by

\[ \tilde{s}_A^2 = \text{max}\left\{s_A^2, \ \frac{1}{D^2 m^3}\right\} \qquad \text{and} \qquad \tilde{s}_B^2 = \text{max} \left\{ s_B^2, \ \frac{1}{D^2 n^3}\right\}. \]

A basic estimator of the LOR is given by

\[ LOR_1 = \ln\left(\tilde{y}_B\right) - \ln\left(1-\tilde{y}_B\right) - \ln\left(\tilde{y}_A\right) + \ln\left(1-\tilde{y}_A\right). \]

However, like the LRR, this estimator will be biased when the one or both phases include only a small number of observations. A bias-corrected estimator of the LOR is given by

\[ LOR_2 = \ln\left(\tilde{y}_B\right) - \ln\left(1-\tilde{y}_B\right) - \frac{\tilde{s}_B^2(2 \tilde{y}_B - 1)}{2 n_B (\tilde{y}_B)^2(1-\tilde{y}_B)^2} - \ln\left(\tilde{y}_A\right) + \ln\left(1-\tilde{y}_A\right) + \frac{\tilde{s}_A^2(2 \tilde{y}_A - 1)}{2 n_A (\tilde{y}_A)^2(1-\tilde{y}_A)^2}. \] This estimator uses a small-sample correction to reduce bias when one or both phases include only a small number of observations.

Bonett & Price (2020b) described the log ratio of medians (LRM) effect size, which can be used to quantify the change in medians from phase A to phase B. The LRM is the natural logarithm of the ratio of medians. This effect size is appropriate for outcomes that are skewed or right-censored (Bonett & Price, 2020b). For an outcome where increase is desirable, the LRM parameter is defined as

\[ \lambda = \ln\left(\eta_B / \eta_A\right) = \ln(\eta_B) - \ln(\eta_A), \] where \(\eta_B\) and \(\eta_A\) are the population medians for phase B and phase A, respectively. For an outcome where decrease is desirable, the LRM parameter has the opposite sign:

\[ \lambda = \ln\left(\eta_A / \eta_B\right) = \ln(\eta_A) - \ln(\eta_B). \]

A natural estimator of the \(\lambda\) is given by

\[ LRM = \ln\left(m_B\right) - \ln\left(m_A\right), \] where \(m_B\) and \(m_A\) are the sample medians for phase B and phase A, respectively. Note that the sample median might be zero for either phase B and phase A in some single-case design data, resulting in infinite LRM.

Standard errors and confidence intervals for LRM can be obtained under the assumption that the outcome data within each phase are mutually independent and follow a common distribution. Using the fact that the logarithm of the median is the same or close to the median of the log-transformed outcomes, the standard error for \(LRM\) can be calculated using the order statistics within each phase (Bonett & Price, 2020a). Let \[ \begin{aligned} l_A &= \text{max}\left\{1, \ \frac{m}{2} - \sqrt{m}\right\}, \quad &u_A &= m - l_A + 1, \\ l_B &= \text{max}\left\{1, \ \frac{n}{2} - \sqrt{n}\right\}, \quad &u_B &= n - l_B + 1, \end{aligned} \] and find \[ q_A = \Phi^{-1}\left(\frac{1}{2^m}\sum_{i=0}^{l_A - 1} \frac{m!}{i!(m - i)!}\right) \quad \text{and} \quad q_B = \Phi^{-1}\left(\frac{1}{2^n}\sum_{j=0}^{l_B - 1} \frac{n!}{j!(n - j)!}\right). \] The standard error of LRM is then \[ SE_{LRM} = \sqrt{\left(\frac{\ln\left(y^B_{(u_B)}\right)-\ln\left(y^B_{(l_B)}\right)}{2\ q_B}\right)^2 + \left(\frac{\ln\left(y^A_{(u_A)}\right)-\ln\left(y^A_{(l_A)}\right)}{2\ q_A}\right)^2}, \] (Bonett & Price, 2020a) where \(y^A_{(l_A)}, y^A_{(u_A)}\) are the \(l_A\) and \(u_A\) order statistics of the phase A outcomes and \(y^B_{(l_B)}, y^B_{(u_B)}\) are the \(l_B\) and \(u_B\) order statistics of the phase B outcomes.

Acion, L., Peterson, J. J., Temple, S., & Arndt, S. (2006). Probabilistic index: An intuitive non-parametric approach to measuring the size of treatment effects. *Statistics in Medicine*, *25*(4), 591–602. https://doi.org/10.1002/sim.2256

Allison, D. B., & Gorman, B. S. (1994). “Make things as simple as possible, but no simpler.” A rejoinder to scruggs and mastropieri. *Behaviour Research and Therapy*, *32*(8), 885–890.

Bonett, D. G., & Price, R. M. (2020a). Confidence intervals for ratios of means and medians. *Journal of Educational and Behavioral Statistics*, *45*(6), 750–770.

Bonett, D. G., & Price, R. M. (2020b). Interval estimation for linear functions of medians in within-subjects and mixed designs. *British Journal of Mathematical and Statistical Psychology*, *73*(2), 333–346.

Busk, P. L., & Serlin, R. C. (1992). Meta-analysis for single-case research. In *Single-case research design and analysis (psychology revivals)* (pp. 199–224). Routledge.

Gingerich, W. J. (1984). Meta-analysis of applied time-series data. *The Journal of Applied Behavioral Science*, *20*(1), 71–79.

Grissom, R. J., & Kim, J. J. (2001). Review of assumptions and problems in the appropriate conceptualization of effect size. *Psychological Methods*, *6*(2), 135–146. https://doi.org/10.1037/1082-989X.6.2.135

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. *Radiology*, *143*, 29–36. https://doi.org/10.1148/radiology.143.1.7063747

Hedges, L. V. (1981). Distribution theory for glass’s estimator of effect size and related estimators. *Journal of Educational Statistics*, *6*(2), 107–128.

Kendall, M. G. (1970). *Rank correlation methods 4th edition*. Griffin.

Ma, H.-H. (2006). An alternative method for quantitative synthesis of single-subject researches: Percentage of data points exceeding the median. *Behavior Modification*, *30*(5), 598–617.

Mee, R. W. (1990). Confidence intervals for probabilities and tolerance regions based on a generalization of the mann-whitney statistic. *Journal of the American Statistical Association*, *85*(411), 793–800.

Newcombe, R. G. (2006). Confidence intervals for an effect size measure based on the mann–whitney statistic. Part 2: Asymptotic methods and evaluation. *Statistics in Medicine*, *25*(4), 559–573.

Parker, R. I., & Vannest, K. (2009). An improved effect size for single-case research: Nonoverlap of all pairs. *Behavior Therapy*, *40*(4), 357–367.

Parker, R. I., Vannest, K. J., & Brown, L. (2009). The improvement rate difference for single-case research. *Exceptional Children*, *75*(2), 135–150.

Parker, R. I., Vannest, K. J., & Davis, J. L. (2011). Effect size in single-case research: A review of nine nonoverlap techniques. *Behavior Modification*, *35*(4), 303–322.

Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap and trend for single-case research: Tau-u. *Behavior Therapy*, *42*(2), 284–299.

Pustejovsky, J. E. (2015). Measurement-comparable effect sizes for single-case studies of free-operant behavior. *Psychological Methods*, *20*(3), 342–359.

Pustejovsky, J. E. (2018). Using response ratios for meta-analyzing single-case designs with behavioral outcomes. *Journal of School Psychology*, *68*, 99–112.

Pustejovsky, J. E. (2019). Procedural sensitivities of effect sizes for single-case designs with dbehavioral outcome measures. *Psychological Methods*, *24*(2), 217–235.

Scruggs, T. E., Mastropieri, M. A., & Casto, G. (1987). The quantitative synthesis of single-subject research: Methodology and validation. *Remedial and Special Education*, *8*(2), 24–33.

Sen, P. K. (1967). A note on asymptotically distribution-free confidence bounds for p \(\{\)x< y\(\}\), based on two independent samples. *Sankhyā: The Indian Journal of Statistics, Series A*, 95–102.

Swan, D. M., & Pustejovsky, J. E. (2018). A Gradual Effects Model for Single-Case Designs. *Multivariate Behavioral Research*. https://doi.org/10.1080/00273171.2018.1466681

Tarlow, K. R. (2017). An improved rank correlation effect size statistic for single-case designs: Baseline corrected tau. *Behavior Modification*, *41*(4), 427–467.

Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the "CL" common language effect size statistics of McGraw and Wong. *Journal of Educational and Behavioral Statistics*, *25*(2), 101–132. https://doi.org/10.2307/1165329