The SingleCaseES package provides R functions for calculating basic, within-case effect size indices for single-case designs, including several non-overlap measures and parametric effect size measures, and for estimating the gradual effects model developed by Swan & Pustejovsky (2018). Estimation procedures for standard errors and confidence intervals are provided for the subset of effect sizes indices with known sampling distributions. This vignette covers the mathematical definitions of the basic non-overlap and parametric effect size measures, along with some details about how they are estimated. Parker, Vannest, & Davis (2011) provides a review of the non-overlap measures, including worked examples of the calculations. Pustejovsky (2019) provides a critical review of non-overlap measures and parametric effect sizes. However, neither of these reviews include details about standard error calculations.

Notation

All of the within-case effect size measures are defined in terms of a comparison of observations between two phases (call them phase A and phase B) within a single-case design. Let mm and nn denote the number of observations in phase A and phase B, respectively. Let y1A,...,ymAy^A_1,...,y^A_m denote the observations from phase A and y1B,...,ynBy^B_1,...,y^B_n denote the observations from phase B.

The non-overlap effect size measures are all defined in terms of ordinal comparisons between data points from phase A and data points from phase B. It will therefore be helpful to have notation for the data points from each phase, sorted in rank order. Thus, let y(1)A,y(2)A,...,y(m)Ay^A_{(1)},y^A_{(2)},...,y^A_{(m)} denote the values of the baseline phase data, sorted in increasing order, and let y(1)B,y(2)B,...,y(n)By^B_{(1)},y^B_{(2)},...,y^B_{(n)} denote the values of the sorted treatment phase data.

The parametric effect size measures are all defined under a simple model for the data-generating process, in which observations in phase A are sampled from a distribution with constant mean μA\mu_A and standard deviation σA\sigma_A, while observations in phase B are sampled from a distribution with constant mean μB\mu_B and standard deviation σB\sigma_B. Let yA\bar{y}_A and yB\bar{y}_B denote the sample means for phase A and phase B, respectively. Let sAs_A and sBs_B denote the sample standard deviations for phase A and phase B, respectively. Let zα/2z_{\alpha / 2} denote the 1α/21 - \alpha / 2 critical value from a standard normal distribution. Finally, we use ln()\ln() to denote the natural logarithm function.

Non-overlap measures

NAP

Parker & Vannest (2009) proposed non-overlap of all pairs (NAP) as an effect size index for use in single-case research. NAP is defined in terms of all pair-wise comparisons between the data points in two different phases for a given case (i.e., a treatment phase versus a baseline phase). For an outcome that is desirable to increase, NAP is the proportion of all such pair-wise comparisons where the treatment phase observation exceeds the baseline phase observation, with pairs that are exactly tied getting a weight of 1/2. NAP is exactly equivalent to the modified Common Language Effect Size (Vargha & Delaney, 2000) and has been proposed as an effect size index in other contexts too (e.g., Acion, Peterson, Temple, & Arndt, 2006).

NAP can be interpreted as an estimate of the probability that a randomly selected observation from the B phase improves upon a randomly selected observation from the A phase. For an outcome where increase is desirable, the effect size parameter is

θ=Pr(YB>YA)+0.5×Pr(YB=YA).\theta = \text{Pr}(Y^B > Y^A) + 0.5 \times \text{Pr}(Y^B = Y^A).

For an outcome where decrease is desirable, the effect size parameter is

θ=Pr(YB<YA)+0.5×Pr(YB=YA).\theta = \text{Pr}(Y^B < Y^A) + 0.5 \times \text{Pr}(Y^B = Y^A).

Estimation

For an outcome where increase is desirable, calculate

qij=I(yjB>yiA)+0.5I(yjB=yiA)q_{ij} = I(y^B_j > y^A_i) + 0.5 I(y^B_j = y^A_i) for i=1,...,mi = 1,...,m and j=1,...,nj = 1,...,n. For an outcome where decrease is desirable, one would instead use

qij=I(yjB<yiA)+0.5I(yjB=yiA).q_{ij} = I(y^B_j < y^A_i) + 0.5 I(y^B_j = y^A_i).

The NAP effect size index is then calculated as

NAP=1mni=1mj=1nqij. \text{NAP} = \frac{1}{m n} \sum_{i=1}^m \sum_{j=1}^n q_{ij}.

Standard errors

The SingleCaseES package provides several different methods for estimating the standard error of NAP. The default method is calculated based on the exactly unbiased variance estimator described by Sen (1967; cf. Mee, 1990), which assumes that the observations are mutually independent and are identically distributed within each phase. Let

Q1=1mn2i=1m[j=1n(qijNAP)]2,Q2=1m2nj=1n[i=1m(qijNAP)]2,andQ3=1mni=1mj=1n(qijNAP)2. \begin{aligned} Q_1 &= \frac{1}{m n^2} \sum_{i=1}^m \left[\sum_{j=1}^n \left(q_{ij} - \text{NAP}\right)\right]^2, \\ Q_2 &= \frac{1}{m^2 n} \sum_{j=1}^n \left[\sum_{i=1}^m \left(q_{ij} - \text{NAP}\right)\right]^2, \qquad \text{and} \\ Q_3 &= \frac{1}{m n} \sum_{i=1}^m \sum_{j=1}^n \left(q_{ij} - \text{NAP}\right)^2. \end{aligned}

The SE is then calculated as

SEunbiased=NAP(1NAP)+nQ1+mQ22Q3(m1)(n1). SE_{\text{unbiased}} = \sqrt{\frac{\text{NAP}(1 - \text{NAP}) + n Q_1 + m Q_2 - 2 Q_3}{(m - 1)(n - 1)}}.

Another method for estimating a standard error was introduced by Hanley & McNeil (1982). This standard error is calculated as

SEHanley=1mn(NAP(1NAP)+(n1)Q1+(m1)Q2), SE_{\text{Hanley}} = \sqrt{\frac{1}{mn}\left(\text{NAP}(1 - \text{NAP}) + (n - 1)Q_1 + (m - 1)Q_2\right)},

with Q1Q_1 and Q2Q_2 defined as above. This standard error is based on the same assumptions as the unbiased SE.

A limitation of SEunbiasedSE_{unbiased} and SEHanleySE_{Hanley} is that they will be equal to zero when there is complete non-overlap (i.e., when NAP\text{NAP} is equal to zero or equal to one). In order to ensure a strictly positive standard error for NAP, the SingleCaseES package calculates SEunbiasedSE_{unbiased} and SEHanleySE_{Hanley} using a truncation of NAP. Specifically, the formulas are evaluated using

NAP̃=max{12mn,min{2mn12mn,NAP}} \widetilde{\text{NAP}} = \text{max}\left\{\frac{1}{2 mn}, \ \text{min}\left\{\frac{2mn - 1}{2mn}, \ \text{NAP} \right\} \right\} in place of NAP\text{NAP}.

A final method for estimating a standard error is to work under the null hypothesis that there is no effect—i.e., that the data points from each phase are sampled from the same distribution. Under the null hypothesis, the sampling variance of NAP\text{NAP} depends only on the number of observations in each phase:

SEnull=m+n+112mn SE_{\text{null}} = \sqrt{\frac{m + n + 1}{12 m n}} (cf. Grissom & Kim, 2001, p. 141). If null hypothesis is not true—that is, if the observations in phase B are drawn from a different distribution than the observations in phase A—then this standard error will tend to be too large.

Confidence interval

A confidence interval for θ\theta can be calculated using a method proposed by Newcombe [Newcombe (2006); Method 5], which assumes that the observations are mutually independent and are identically distributed within each phase. Using a confidence level of 100%×(1α)100\% \times (1 - \alpha), the endpoints of the confidence interval are defined as the values of θ\theta that satisfy the equality

(NAPθ)2=zα/22hθ(1θ)mn[1h+1θ2θ+θ1+θ], (\text{NAP} - \theta)^2 = \frac{z^2_{\alpha / 2} h \theta (1 - \theta)}{mn}\left[\frac{1}{h} + \frac{1 - \theta}{2 - \theta} + \frac{\theta}{1 + \theta}\right],

where h=(m+n)/21h = (m + n) / 2 - 1 and zα/2z_{\alpha / 2} is 1α/21 - \alpha / 2 critical value from a standard normal distribution. This equation is a fourth-degree polynomial in θ\theta, solved using a numerical root-finding algorithm.

PND

Scruggs, Mastropieri, & Casto (1987) proposed the percentage of non-overlapping data (PND) as an effect size index for single-case designs. For an outcome where increase is desirable, PND is defined as the proportion of observations in the B phase that exceed the highest observation from the A phase. For an outcome where decrease is desirable, PND is the proportion of observations in the B phase that are less than the lowest observation from the A phase.

This effect size does not have a stable parameter definition because the magnitude of the maximum (or minimum) value from phase A depends on the number of observations in the phase (Allison & Gorman, 1994; Pustejovsky, 2019).

Estimation

For an outcome where increase is desirable,

PND=1nj=1nI(yjB>y(m)A), \text{PND} = \frac{1}{n} \sum_{j=1}^n I(y^B_j > y^A_{(m)}),

where y(m)Ay^A_{(m)} is the maximum value of y1A,...,ymAy^A_1,...,y^A_m. For an outcome where decrease is desirable,

PND=1nj=1nI(yjB<y(1)A), \text{PND} = \frac{1}{n} \sum_{j=1}^n I(y^B_j < y^A_{(1)}),

where y(1)Ay^A_{(1)} is the minimum value of y1A,...,ymAy^A_1,...,y^A_m.

The sampling distribution of PND has not been described, and so standard errors and confidence intervals are not available.

PEM

Ma (2006) proposed the percent exceeding the median, defined as the proportion of observations in phase B that improve upon the median of phase A. Ma (2006) did not specify an effect size parameter corresponding to this index. However, it would be reasonable to define the parameter as the probability that a randomly selected observation from the B phase represents an improvement over the median of the distribution of A phase outcomes. Let ηA\eta_A denote the median of the distribution of outcomes in phase A. For an outcome where increase is desirable, the PEM parameter would then be

ξ=Pr(YB>ηA)+0.5×Pr(YB=ηA). \xi = \text{Pr}\left(Y_B > \eta_A\right) + 0.5 \times \text{Pr}\left(Y_B = \eta_A\right). For an outcome where decrease is desirable, it would be ξ=Pr(YB<ηA)+0.5×Pr(YB=ηA). \xi = \text{Pr}\left(Y_B < \eta_A\right) + 0.5 \times \text{Pr}\left(Y_B = \eta_A\right).

Estimation

For an outcome where increase is desirable,

PEM=1nj=1n[I(yjB>mA)+0.5×I(yjB=mA)], \text{PEM} = \frac{1}{n}\sum_{j=1}^n \left[ I(y^B_j > m_A) + 0.5 \times I(y^B_j = m_A) \right],

where mA=median(y1A,...,ymA)m_A = \text{median}(y^A_1,...,y^A_m). For an outcome where decrease is desirable,

PEM=1nj=1n[I(yjB<y(1)A)+0.5×I(yjB=mA)]. \text{PEM} = \frac{1}{n}\sum_{j=1}^n \left[ I(y^B_j < y^A_{(1)}) + 0.5 \times I(y^B_j = m_A) \right].

The sampling distribution of PEM has not been described, and so standard errors and confidence intervals are not available.

PAND

For an outcome where increase (decrease) is desirable, Parker, Vannest, & Davis (2011) defined PAND as the proportion of observations remaining after removing the fewest possible number of observations from either phase so that the highest remaining point from the baseline phase is less than the lowest remaining point from the treatment phase (lowest remaining point from the baseline phase is larger than the highest remaining point from the treatment phase).

This effect size does not have a stable parameter definition because its magnitude depends on the number of observations in each phase (Pustejovsky, 2019).

Estimation

For an outcome where increase is desirable, PAND is calculated as

PAND=1m+nmax{(i+j)I(y(i)A<y(n+1j)B)}, \text{PAND} = \frac{1}{m + n} \max \left\{\left(i + j\right) I\left(y^A_{(i)} < y^B_{(n + 1 - j)}\right)\right\},

where y(0)A=y^A_{(0)} = - \infty, y(n+1)B=y^B_{(n + 1)} = \infty, and the maximum is taken over the values 0im0 \leq i \leq m and 0jn0 \leq j \leq n. For an outcome where decrease is desirable, PAND is calculated as

PAND=1m+nmax{(i+j)I(y(m+1i)A>y(j)B)}, \text{PAND} = \frac{1}{m + n} \max \left\{\left(i + j\right) I\left(y^A_{(m + 1 - i)} > y^B_{(j)}\right)\right\},

where y(m+1)A=y^A_{(m + 1)} = \infty, y(0)B=y^B_{(0)} = -\infty, and the maximum is taken over the values 0im0 \leq i \leq m and 0jn0 \leq j \leq n.

The sampling distribution of PAND has not been described, and so standard errors and confidence intervals are not available.

IRD

The robust improvement rate difference is defined as the robust phi coefficient corresponding to a certain 2×22 \times 2 table that is a function of the degree of overlap between the observations each phase (Parker, Vannest, & Davis, 2011). This effect size does not have a stable parameter definition because its magnitude depends on the number of observations in each phase (Pustejovsky, 2019).

Estimation

For notational convenience, let y(0)A=y(0)B=y^A_{(0)} = y^B_{(0)} = -\infty and y(m+1)A=y(n+1)B=y^A_{(m + 1)} = y^B_{(n + 1)} = \infty. For an outcome where increase is desirable, let ĩ\tilde{i} and j̃\tilde{j} denote the values that maximize the quantity

(i+j)I(y(i)A<y(n+1j)B) \left(i + j\right) I\left(y^A_{(i)} < y^B_{(n + 1 - j)}\right) for 0im0 \leq i \leq m and 0jn0 \leq j \leq n. For an outcome where decrease is desirable, let ĩ\tilde{i} and j̃\tilde{j} instead denote the values that maximize the quantity

(i+j)I(y(m+1i)A>y(j)B). \left(i + j\right) I\left(y^A_{(m + 1 - i)} > y^B_{(j)}\right).

Now calculate the 2×22 \times 2 table

$$ \begin{array}{|c|c|} \hline m - \tilde{i} & \tilde{j} \\ \hline \tilde{i} & n - \tilde{j} \\ \hline \end{array} $$

Parker, Vannest, & Brown (2009) proposed the non-robust improvement rate difference, which is equivalent to the phi coefficient from this table. Parker, Vannest, & Davis (2011) proposed to instead use the robust phi coefficient, which involves modifying the table so that the row- and column-margins are equal. Robust IRD is thus equal to

IRD=nmĩj̃2nm+nĩj̃2m. \text{IRD} = \frac{n - m - \tilde{i} - \tilde{j}}{2 n} - \frac{m + n - \tilde{i} - \tilde{j}}{2 m}.

Robust IRD is algebraically related to PAND as

IRD=1(m+n)22mn(1PAND). \text{IRD} = 1 - \frac{(m + n)^2}{2mn}\left(1 - \text{PAND}\right). Just as with PAND, the sampling distribution of robust IRD has not been described, and so standard errors and confidence intervals are not available.

Tau

Tau is one of several effect sizes proposed by Parker, Vannest, Davis, & Sauber (2011) and known collectively as “Tau-U.” The basic estimator Tau does not make any adjustments for time trends. For an outcome where increase is desirable, the effect size parameter is

τ=Pr(YB>YA)Pr(YB<YA)\tau = \text{Pr}(Y^B > Y^A) - \text{Pr}(Y^B < Y^A)

(for an outcome where decrease is desirable, the effect size parameter would have the opposite sign). This parameter is a simple linear transformation of the NAP parameter θ\theta:

τ=2θ1.\tau = 2 \theta - 1.

Estimation

For an outcome where increase is desirable, calculate

wij=I(yjB>yiA)I(yjB<yiA)w_{ij} = I(y^B_j > y^A_i) - I(y^B_j < y^A_i)

For an outcome where decrease is desirable, one would instead use

wij=I(yjB<yiA)I(yjB>yiA).w_{ij} = I(y^B_j < y^A_i) - I(y^B_j > y^A_i).

The Tau effect size index is then calculated as

Tau=1mni=1mj=1nwij=2×NAP1. \text{Tau} = \frac{1}{m n} \sum_{i=1}^m \sum_{j=1}^n w_{ij} = 2 \times \text{NAP} - 1.

Standard errors

Standard errors and confidence intervals for Tau are calculated using transformations of the corresponding SEs and CIs for NAP. All of the methods assume that the observations are mutually independent and are identically distributed within each phase.

Standard errors for Tau are calculated as SETau=2SENAPSE_{\text{Tau}} = 2 SE_{\text{NAP}}, where SENAPSE_{\text{NAP}} is the standard error for NAP calculated based on one of the available methods (unbiased, Hanley, or null).

Confidence intervals

The CI for τ\tau is calculated as

[Lτ,Uτ]=[2Lθ1,2Uθ1], [L_{\tau}, U_{\tau}] = [2 L_{\theta} - 1, 2 U_{\theta} - 1],

where LθL_{\theta} and UθU_{\theta} are the lower and upper bounds of the CI for θ\theta, calculated using a method proposed by Newcombe (2006).

Tau-U

Tau-U is one of several effect sizes proposed by Parker, Vannest, Davis, & Sauber (2011). The Tau-U variant is similar to Tau, but includes an adjustment term that is a function of the baseline time trend. For an outcome where increase is desirable, the index is calculated as Kendall’s SS statistic for the comparison between the phase B data and the phase A data, plus Kendall’s SS statistic for the A phase observations, scaled by the product of the number of observations in each phase.

This effect size does not have a stable parameter definition and its feasible range depends on the number of observations in each phase (Tarlow, 2017).

Estimation

For an outcome where increase is desirable, calculate

wijAB=I(yjB>yiA)I(yjB<yiA)w^{AB}_{ij} = I(y^B_j > y^A_i) - I(y^B_j < y^A_i)

and

wijAA=I(yjA>yiA)I(yjA<yiA)w^{AA}_{ij} = I(y^A_j > y^A_i) - I(y^A_j < y^A_i)

For an outcome where decrease is desirable, one would instead use

wijAB=I(yjB<yiA)I(yjB>yiA)w^{AB}_{ij} = I(y^B_j < y^A_i) - I(y^B_j > y^A_i)

and

wijAA=I(yjA<yiA)I(yjA>yiA).w^{AA}_{ij} = I(y^A_j < y^A_i) - I(y^A_j > y^A_i).

The Tau-U effect size index is then calculated as

Tau-U=1mn(i=1mj=1nwijABi=1m1j=i+1mwijAA). \text{Tau-U} = \frac{1}{m n} \left(\sum_{i=1}^m \sum_{j=1}^n w^{AB}_{ij} - \sum_{i=1}^{m - 1} \sum_{j=i + 1}^m w^{AA}_{ij}\right).

The sampling distribution of Tau-U has not been described, and so standard errors and confidence intervals are not available.

Tau-BC

Tarlow (2017) proposed to modify the Tau effect size index by first adjusting the observations for a linear trend in the A phase. The index can be calculated with or without conducting a pre-test for significance of the A phase time trend. We provide two approaches to calculate Tau no matter whether the baseline trend is significant or not. The first approach is using Kendall’s rank correlation (with adjustment for ties), as used in Tarlow (2017). The second one is using Tau (non-overlap) index (without adjustment for ties).

If the pre-test for A phase time trend is used, then slope of the baseline trend is first tested using Kendall’s rank correlation. If the baseline slope is significantly different from zero, the outcomes are adjusted for baseline trend using Theil-Sen regression, and the residuals from Theil-Sen regression are used to calculate the Kendall’s rank correlation or Tau (non-overlap) index. If the baseline slope is not significantly different from zero, then no baseline trend adjustment is made, and the Tau-BC effect size is calculated using Kendall’s rank correlation or Tau (non-overlap) index.

If the pre-test for A phase time trend is not used, then the outcomes are adjusted for baseline trend using Theil-Sen regression, regardless of whether the slope is significantly different from zero. The residuals from Theil-Sen regression are then used to calculate the Kendall’s rank correlation or Tau (non-overlap) index.

The formal definition of Tau-BC require positing a model for the time trend in the data series. Thus, suppose that the outcomes can be expressed in terms of a linear time trend and an error term:

yiA=α+β(i)+ϵiA,fori=1,...,myjB=α+β(m+j)+ϵjBforj=1,...,n. \begin{aligned} y_i^A &= \alpha + \beta (i) + \epsilon_i^A, \quad \text{for} \quad i = 1,...,m \\ y_j^B &= \alpha + \beta (m + j) + \epsilon_j^B \quad \text{for} \quad j = 1,...,n. \end{aligned} Within each phase, assume that the error terms are independent and share a common distribution. The Tau-BC parameter can then be defined as the Tau parameter for the distribution of the error terms, or

τBC=Pr(ϵB>ϵA)Pr(ϵB<ϵA). \tau_{BC} = \text{Pr}(\epsilon^B > \epsilon^A) - \text{Pr}(\epsilon^B < \epsilon^A). An equivalent definition in terms of the outcome distributions is

τBC=Pr[YjBβ(m+ji)>YiA]Pr[YjBβ(m+ji)<YiA] \tau_{BC} = \text{Pr}\left[Y_j^B - \beta (m + j - i) > Y_i^A \right] - \text{Pr}\left[Y_j^B - \beta (m + j - i) < Y_i^A\right] for i=1,...,mi=1,...,m and j=1,...,nj = 1,...,n.

Estimation

Estimation of τBC\tau_{BC} entails correcting the data series for the baseline slope β\beta. If using the baseline trend pre-test, the null hypothesis of H0:β=0H_0: \beta = 0 is first tested using Kendall’s rank correlation. If the test is not significant, then set β̂=0\hat\beta = 0 and α̂=0\hat\alpha = 0. If the test is significant or if the pre-test for baseline time trend is not used, then the slope is estimated by Theil-Sen regression. Specifically, we calculate the slope between every pair of observations in the A phase: shi=yiAyhAih s_{hi} = \frac{y_i^A - y_h^A}{i - h} for i=1,...,m1i = 1,...,m - 1 and h=i+1,...,mh = i+1,...,m. The overall slope estimate is taken to be the median over all m(m1)/2m(m - 1) / 2 slope pairs:

β̂=median{s12,...,s(m1)m}. \hat\beta = \text{median}\left\{s_{12},...,s_{(m-1)m}\right\}. The intercept term is estimated by taking the median observation in the A phase after correcting for the estimated linear time trend:

α̂=median{y1Aβ̂×1,y2Aβ̂×2,...,ymAβ̂×m}. \hat\alpha = \text{median}\left\{y_1^A - \hat\beta \times 1, \ y_2^A - \hat\beta \times 2, ..., \ y_m^A - \hat\beta \times m\right\}. However, the intercept estimate is irrelevant for purposes of estimating Tau-BC because the Tau estimator is a function of ranks and is invariant to a linear shift of the data series.

After estimating the phase A time trend, τBC\tau_{BC} is estimated by de-trending the full data series and calculating Kendall’s rank correlation or Tau (non-overlap) on the de-trended observations. Specifically, set ϵ̂iA=yiAβ̂(i)α̂\hat\epsilon_i^A = y_i^A - \hat\beta (i) - \hat\alpha for i=1,...,mi = 1,...,m and ϵ̂jB=yjBβ̂(m+j)α̂\hat\epsilon_j^B = y_j^B - \hat\beta (m + j) - \hat\alpha. For an outcome where increase is desirable, calculate wijϵ=I(ϵ̂jB>ϵ̂iA)I(ϵ̂jB<ϵ̂iA)w^\epsilon_{ij} = I\left(\hat\epsilon^B_j > \hat\epsilon^A_i\right) - I\left(\hat\epsilon^B_j < \hat\epsilon^A_i\right)

or, for an outcome where decrease is desirable, calculate wijϵ=I(ϵ̂jB<ϵ̂iA)I(ϵ̂jB>ϵ̂iA).w^\epsilon_{ij} = I\left(\hat\epsilon^B_j < \hat\epsilon^A_i\right) - I\left(\hat\epsilon^B_j > \hat\epsilon^A_i\right).

Tau-BC (non-overlap) is then estimated by TauBC=1mni=1mj=1nwijϵ. \text{Tau}_{BC} = \frac{1}{m n} \sum_{i=1}^m \sum_{j=1}^n w^\epsilon_{ij}.

If calculated with Kendall’s rank correlation, Tau-BC is estimated as the rank correlation between {ϵ̂1A,,ϵ̂mA,ϵ̂1B,,ϵ̂nB}\left\{\hat\epsilon^A_1, \dots, \hat\epsilon^A_m, \hat\epsilon^B_1, \dots, \hat\epsilon^B_n \right\} and a dummy coded variable {01,,0m,11,,1n}\left\{0_1,\dots,0_m, 1_1,\dots,1_n \right\}, with an adjustment for ties (Kendall, 1970, p. 35). Specifically,

TauBC*=1Di=1mj=1nwijϵ, \text{Tau}_{BC}^* = \frac{1}{D} \sum_{i=1}^m \sum_{j=1}^n w^\epsilon_{ij}, where

D=m×n×((m+n)(m+n1)2U) D = \sqrt{m \times n \times \left(\frac{(m+n)(m+n-1)}{2}-U\right)} and UU is the number of ties between all possible pairs of observations (including pairs within phase A, pairs within phase B, and pairs of one phase A and one phase B data point). UU can be computed as

U=i=1m1j=i+1mI(ϵ̂iA=ϵ̂jA)+i=1n1j=i+1nI(ϵ̂iB=ϵ̂jB)+i=1mj=1nI(ϵ̂iA=ϵ̂jB). U = \sum_{i=1}^{m - 1} \sum_{j = i+1}^m I\left(\hat\epsilon^A_i = \hat\epsilon^A_j\right) + \sum_{i=1}^{n - 1} \sum_{j = i+1}^n I\left(\hat\epsilon^B_i = \hat\epsilon^B_j\right) + \sum_{i=1}^m \sum_{j=1}^n I\left(\hat\epsilon^A_i = \hat\epsilon^B_j\right).

We prefer and recommend to use the Tau-AB form, which divides by m×nm \times n rather than by DD, because it leads to a simpler interpretation. Furthermore, using DD means that TauBC*\text{Tau}_{BC}^* may be sensitive to variation in phase lengths. To see this sensitivity, consider a scenario where there are no tied values and so every value {ϵ̂1A,,ϵ̂mA,ϵ̂1B,,ϵ̂nB}\left\{\hat\epsilon^A_1, \dots, \hat\epsilon^A_m, \hat\epsilon^B_1, \dots, \hat\epsilon^B_n \right\} is unique. In this case, U=0U = 0 and D=12mn(m+n)(m+n1)=mn1+m12n+n12m. D = \sqrt{\frac{1}{2} m n (m + n)(m + n - 1)} = m n \sqrt{1 + \frac{m - 1}{2n} + \frac{n - 1}{2m}}. Thus, the denominator will always be larger than mnm n, meaning that TauBC*\text{Tau}_{BC}^* will always be smaller than TauBC\text{Tau}_{BC}. Further, the largest and smallest possible values of TauBC*\text{Tau}_{BC}^* will be ±mn/D\pm m n / D, or about 1/21 / \sqrt{2} when mm and nn are close to equal. In contrast, the largest and smallest possible values of TauBC\text{Tau}_{BC} are always -1 and 1, respectively.

Standard errors and confidence intervals

The exact sampling distribution of TauBC*\text{Tau}_{BC}^* (Kendall, adjusted for ties) has not been described. Tarlow (2017) proposed to approximate its sampling variance using SEKendall=2(1TauBC2)m+n, SE_{Kendall} = \sqrt{\frac{2 (1 - \text{Tau}_{BC}^2)}{m + n}}, arguing that this would generally be conservative (in the sense of over-estimating the true sampling error). When Tau-BC is calculated using Kendall’s rank correlation, the SingleCaseES package reports a standard error based on this approximation.

When calculated without adjustment for ties, the SingleCaseES package takes a different approach for estimating the standard error for TauBC\text{Tau}_{BC} (non-overlap), reporting approximate standard errors and confidence intervals for TauBC\text{Tau}_{BC} based on the methods described above for Tau\text{Tau} (non-overlap, without baseline trend correction). An important limitation of this approach is that it does not account for the uncertainty introduced by estimating the phase A time trend (i.e., the uncertainty in β̂\hat\beta).

Parametric effect sizes

SMD

Gingerich (1984) and Busk & Serlin (1992) proposed a within-case standardized mean difference for use in single-case designs (within-case because it is based on the data for a single individual, rather than across individuals). The standardized mean difference parameter δ\delta is defined as the difference in means between phase B and phase A, scaled by the standard deviation of the phase A outcome distribution:

δ=μBμAσA. \delta = \frac{\mu_B - \mu_A}{\sigma_A}.

Note that σA\sigma_A represents within-individual variability only. In contrast, the SMD applied to a between-groups design involves scaling by a measure of between- and within-individual variability. Thus, the scale of the within-case SMD is not comparable to the scale of the SMD from a between-groups design.

The SMD δ\delta can be estimated under the assumption that the observations are mutually independent and have constant variance within each phase. There are two ways that the SMD, depending on whether it is reasonable to assume that the standard deviation of the outcome is constant across phases (i.e., σA=σB\sigma_A = \sigma_B).

Baseline SD

Gingerich (1984) and Busk & Serlin (1992) originally suggested scaling by the SD from phase A only, due to the possibility of non-constant variance across phases. Without assuming constant SDs, an estimate of the standardized mean difference is

dA=(134m5)yByAsA. d_A = \left(1 - \frac{3}{4m - 5}\right) \frac{\bar{y}_B - \bar{y}_A}{s_A}.

The term in parentheses is a small-sample bias correction term (cf. Hedges, 1981; Pustejovsky, 2019). The standard error of this estimate is calculated as

SEdA=(134m5)1m+sB2nsA2+dA22(m1). SE_{d_A} = \left(1 - \frac{3}{4m - 5}\right)\sqrt{\frac{1}{m} + \frac{s_B^2}{n s_A^2} + \frac{d_A^2}{2(m - 1)}}.

Pooled SD

If it is reasonable to assume that the SDs are constant across phases, then one can use the pooled sample SD, defined as

sp=(m1)sA2+(n1)sB2m+n2. s_p = \sqrt{\frac{(m - 1)s_A^2 + (n - 1) s_B^2}{m + n - 2}}.

The SMD can then be estimated as

dp=(134(m+n)9)yByAsp, d_p = \left(1 - \frac{3}{4(m + n) - 9}\right) \frac{\bar{y}_B - \bar{y}_A}{s_p},

with approximate standard error

SEdA=(134(m+n)9)1m+1n+dp22(m+n2). SE_{d_A} = \left(1 - \frac{3}{4(m + n) - 9}\right)\sqrt{\frac{1}{m} + \frac{1}{n} + \frac{d_p^2}{2(m + n - 2)}}.

Confidence intervals

Whether the estimator is based on the baseline or pooled standard deviation, an approximate confidence interval for δ\delta is given by

[dzα/2×SEd,d+zα/2×SEd]. [d - z_{\alpha/2} \times SE_d,\quad d + z_{\alpha/2} \times SE_d].

LRR

The log-response ratio (LRR) is an effect size index that quantifies the change from phase A to phase B in proportionate terms. Pustejovsky (2015) proposed to use it as an effect size index for single-case designs (see also Pustejovsky, 2018). The LRR is appropriate for use with outcomes on a ratio scale—that is, where zero indicates the total absence of the outcome. The LRR parameter is defined as

ψ=ln(μB/μA), \psi = \ln\left(\mu_B / \mu_A\right),

The logarithm is used so that the range of the index is less restricted.

LRR-decreasing and LRR-increasing

There are two variants of the LRR (Pustejovsky, 2018), corresponding to whether therapeutic improvements correspond to negative values of the index (LRR-decreasing or LRRd) or positive values of the index (LRR-increasing or LRRi). For outcomes measured as frequency counts or rates, LRRd and LRRi are identical in magnitude but have opposite sign. However, for outcomes measured as proportions (ranging from 0 to 1) or percentages (ranging from 0% to 100%), LRRd and LRRi will differ in both sign and magnitude because the outcomes are first transformed to be consistent with the selected direction of therapeutic improvement.

Estimation

To account for the possibility that the sample means may be equal to zero, even if the mean levels are strictly greater than zero, the LRR is calculated using truncated sample means, given by ỹA=max{yA,12Dm}andỹB=max{yB,12Dn}, \tilde{y}_A = \text{max} \left\{ \bar{y}_A, \ \frac{1}{2 D m}\right\} \qquad \text{and} \qquad \tilde{y}_B = \text{max} \left\{ \bar{y}_B, \ \frac{1}{2 D n}\right\}, where DD is a constant that depends on the scale and recording procedure used to measure the outcomes (Pustejovsky, 2018). To ensure that the standard error of LRR is strictly positive, it is calculated using truncated sample variances, given by s̃A2=max{sA2,1D2m3}ands̃B2=max{sB2,1D2n3}. \tilde{s}_A^2 = \text{max}\left\{s_A^2, \ \frac{1}{D^2 m^3}\right\} \qquad \text{and} \qquad \tilde{s}_B^2 = \text{max} \left\{ s_B^2, \ \frac{1}{D^2 n^3}\right\}.

A basic estimator of the LRR is then given by

R1=ln(ỹB)ln(ỹA). R_1 = \ln\left(\tilde{y}_B\right) - \ln\left(\tilde{y}_A\right).

However, R1R_1 will be biased when one or both phases include only a small number of observations. A bias-corrected estimator is given by

R2=ln(ỹB)+s̃B22nỹB2ln(ỹA)s̃A22mỹA2. R_2 = \ln\left(\tilde{y}_B\right) + \frac{\tilde{s}_B^2}{2 n \tilde{y}_B^2} - \ln\left(\tilde{y}_A\right) - \frac{\tilde{s}_A^2}{2 m \tilde{y}_A^2}. The bias-corrected estimator is the default option in SingleCaseES.

Standard errors

Under the assumption that the outcomes in each phase are mutually independent, an approximate standard error for R1R_1 or R2R_2 is given by

SER=s̃A2mỹA2+s̃B2nỹB2. SE_R = \sqrt{\frac{\tilde{s}_A^2}{m \tilde{y}_A^2} + \frac{\tilde{s}_B^2}{n \tilde{y}_B^2}}.

Confidence intervals

Under the same assumptions, an approximate confidence interval for ψ\psi is

[Rzα/2×SER,R+zα/2×SER]. [R - z_{\alpha / 2} \times SE_R,\quad R + z_{\alpha / 2} \times SE_R].

LOR

The log-odds ratio is an effect size index that quantifies the change from phase A to phase B in terms of proportionate change in the odds that a behavior is occurring (Pustejovsky, 2015). It is appropriate for use with outcomes on a percentage or proportion scale. The LOR parameter is defined as

ψ=ln(μB/(1μB)μA/(1μA)), \psi = \ln\left(\frac{\mu_B/(1-\mu_B)}{\mu_A/(1-\mu_A)}\right),

where the outcomes are measured in proportions. The log odds ratio ranges from -\infty to \infty, with a value of zero corresponding to no change in mean levels.

Estimation

To account for the possibility that the sample means may be equal to zero or one, even if the mean levels are strictly between zero and one, the LOR is calculated using truncated sample means, given by

ỹA=max{min[yA,112Dm],12Dm} \tilde{y}_A = \text{max} \left\{ \text{min}\left[\bar{y}_A, 1 - \frac{1}{2 D m}\right], \frac{1}{2 D m}\right\} and

ỹB=max{min[yB,112Dn],12Dn}, \tilde{y}_B = \text{max} \left\{ \text{min}\left[\bar{y}_B, 1 - \frac{1}{2 D n}\right], \frac{1}{2 D n}\right\},

where DD is a constant that depends on the scale and recording procedure used to measure the outcomes (Pustejovsky, 2018).To ensure that the corresponding standard error is strictly positive, it is calculated using truncated sample variances, given by

s̃A2=max{sA2,1D2m3}ands̃B2=max{sB2,1D2n3}. \tilde{s}_A^2 = \text{max}\left\{s_A^2, \ \frac{1}{D^2 m^3}\right\} \qquad \text{and} \qquad \tilde{s}_B^2 = \text{max} \left\{ s_B^2, \ \frac{1}{D^2 n^3}\right\}.

A basic estimator of the LOR is given by

LOR1=ln(ỹB)ln(1ỹB)ln(ỹA)+ln(1ỹA). LOR_1 = \ln\left(\tilde{y}_B\right) - \ln\left(1-\tilde{y}_B\right) - \ln\left(\tilde{y}_A\right) + \ln\left(1-\tilde{y}_A\right).

However, like the LRR, this estimator will be biased when the one or both phases include only a small number of observations. A bias-corrected estimator of the LOR is given by

LOR2=ln(ỹB)ln(1ỹB)s̃B2(2ỹB1)2nB(ỹB)2(1ỹB)2ln(ỹA)+ln(1ỹA)+s̃A2(2ỹA1)2nA(ỹA)2(1ỹA)2. LOR_2 = \ln\left(\tilde{y}_B\right) - \ln\left(1-\tilde{y}_B\right) - \frac{\tilde{s}_B^2(2 \tilde{y}_B - 1)}{2 n_B (\tilde{y}_B)^2(1-\tilde{y}_B)^2} - \ln\left(\tilde{y}_A\right) + \ln\left(1-\tilde{y}_A\right) + \frac{\tilde{s}_A^2(2 \tilde{y}_A - 1)}{2 n_A (\tilde{y}_A)^2(1-\tilde{y}_A)^2}. This estimator uses a small-sample correction to reduce bias when one or both phases include only a small number of observations.

Standard errors

Under the assumption that the outcomes in each phase are mutually independent, an approximate standard error for LORLOR is given by

SELOR=s̃A2nAỹA2(1ỹA)2+s̃B2nBỹB2(1ỹB)2. SE_{LOR} = \sqrt{\frac{\tilde{s}^2_A}{n_A \tilde{y}_A^2 (1 - \tilde{y}_A)^2} + \frac{\tilde{s}^2_B}{n_B \tilde{y}_B^2 (1 - \tilde{y}_B)^2}}.

Confidence intervals

Under the same assumption, an approximate confidence interval for ψ\psi is

[LORzα/2×SELOR,LOR+zα/2×SELOR], [LOR - z_{\alpha / 2} \times SE_{LOR},\quad LOR + z_{\alpha / 2} \times SE_{LOR}],

LRM

Bonett & Price (2020b) described the log ratio of medians (LRM) effect size, which can be used to quantify the change in medians from phase A to phase B. The LRM is the natural logarithm of the ratio of medians. This effect size is appropriate for outcomes that are skewed or right-censored (Bonett & Price, 2020b). For an outcome where increase is desirable, the LRM parameter is defined as

λ=ln(ηB/ηA)=ln(ηB)ln(ηA), \lambda = \ln\left(\eta_B / \eta_A\right) = \ln(\eta_B) - \ln(\eta_A), where ηB\eta_B and ηA\eta_A are the population medians for phase B and phase A, respectively. For an outcome where decrease is desirable, the LRM parameter has the opposite sign:

λ=ln(ηA/ηB)=ln(ηA)ln(ηB). \lambda = \ln\left(\eta_A / \eta_B\right) = \ln(\eta_A) - \ln(\eta_B).

Estimation

A natural estimator of the λ\lambda is given by

LRM=ln(mB)ln(mA), LRM = \ln\left(m_B\right) - \ln\left(m_A\right), where mBm_B and mAm_A are the sample medians for phase B and phase A, respectively. Note that the sample median might be zero for either phase B and phase A in some single-case design data, resulting in infinite LRM.

Standard errors

Standard errors and confidence intervals for LRM can be obtained under the assumption that the outcome data within each phase are mutually independent and follow a common distribution. Using the fact that the logarithm of the median is the same or close to the median of the log-transformed outcomes, the standard error for LRMLRM can be calculated using the order statistics within each phase (Bonett & Price, 2020a). Let lA=max{1,m2m},uA=mlA+1,lB=max{1,n2n},uB=nlB+1, \begin{aligned} l_A &= \text{max}\left\{1, \ \frac{m}{2} - \sqrt{m}\right\}, \quad &u_A &= m - l_A + 1, \\ l_B &= \text{max}\left\{1, \ \frac{n}{2} - \sqrt{n}\right\}, \quad &u_B &= n - l_B + 1, \end{aligned} and find qA=Φ1(12mi=0lA1m!i!(mi)!)andqB=Φ1(12nj=0lB1n!j!(nj)!). q_A = \Phi^{-1}\left(\frac{1}{2^m}\sum_{i=0}^{l_A - 1} \frac{m!}{i!(m - i)!}\right) \quad \text{and} \quad q_B = \Phi^{-1}\left(\frac{1}{2^n}\sum_{j=0}^{l_B - 1} \frac{n!}{j!(n - j)!}\right). The standard error of LRM is then SELRM=(ln(y(uB)B)ln(y(lB)B)2qB)2+(ln(y(uA)A)ln(y(lA)A)2qA)2, SE_{LRM} = \sqrt{\left(\frac{\ln\left(y^B_{(u_B)}\right)-\ln\left(y^B_{(l_B)}\right)}{2\ q_B}\right)^2 + \left(\frac{\ln\left(y^A_{(u_A)}\right)-\ln\left(y^A_{(l_A)}\right)}{2\ q_A}\right)^2}, (Bonett & Price, 2020a) where y(lA)A,y(uA)Ay^A_{(l_A)}, y^A_{(u_A)} are the lAl_A and uAu_A order statistics of the phase A outcomes and y(lB)B,y(uB)By^B_{(l_B)}, y^B_{(u_B)} are the lBl_B and uBu_B order statistics of the phase B outcomes.

Confidence intervals

An approximate confidence interval for λ\lambda is [LRMzα/2×SELRM,LRM+zα/2×SELRM], \left[LRM - z_{\alpha/2} \times SE_{LRM},\quad LRM + z_{\alpha/2} \times SE_{LRM}\right], where zα/2z_{\alpha/2} is 1α/21 - \alpha/2 critical value from a standard normal distribution.

PoGO

Ferron, Goldstein, Olszewski, & Rohrer (2020) proposed a percent of goal obtained (PoGO) effect size metric for use in single-case designs. Let γ\gamma denote the goal level of behavior, which must be specified by the analyst or researcher. Percent of goal obtained quantifies the change in the mean level of behavior relative to the goal. The PoGO parameter θ\theta is defined as: θ=μBμAγμA×100%. \theta = \frac{\mu_B - \mu_A}{\gamma - \mu_A} \times 100\%.

Estimation

Approaches for estimation of PoGO depend on one’s assumption about the stability of the observations in phases A and B. Under the assumption that the observations are temporally stable, a natural estimator of PoGO is PoGO=yByAγyA×100%. PoGO = \frac{\bar{y}_B - \bar{y}_A}{\gamma - \bar{y}_A} \times 100\%.

Standard errors

Patrona, Ferron, Olszewski, Kelley, & Goldstein (2022) proposed a method for calculating a standard error for the PoGO estimator under the assumptions that the observations within each phase are mutually independent. The standard error uses an approximation for the standard error of two independent, normally distributed random variables due to Dunlap & Silver (1986). It is calculated as SEPoGO=1γyAsA2nA+sB2nB+(yByAγyA)2sA2nA. SE_{PoGO} = \frac{1}{\gamma - \bar{y}_A} \sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B} + \left(\frac{\bar{y}_B - \bar{y}_A}{\gamma - \bar{y}_A}\right)^2 \frac{s_A^2}{n_A}}. Patrona et al. (2022) also provided a more general approximation, which can be applied when PoGO is estimated using regressions that control for time trends or auto-correlation. However, these methods are not implemented in SingleCaseES.

Confidence intervals

An approximate confidence interval for PoGOPoGO is given by [PoGOzα/2×SEPoGO,PoGO+zα/2×SEPoGO], [PoGO - z_{\alpha / 2} \times SE_{PoGO},\quad PoGO + z_{\alpha / 2} \times SE_{PoGO}], where zα/2z_{\alpha / 2} is the 1α/21 - \alpha / 2 critical value from a standard normal distribution (Patrona et al., 2022).

References

Acion, L., Peterson, J. J., Temple, S., & Arndt, S. (2006). Probabilistic index: An intuitive non-parametric approach to measuring the size of treatment effects. Statistics in Medicine, 25(4), 591–602. https://doi.org/10.1002/sim.2256
Allison, D. B., & Gorman, B. S. (1994). “Make things as simple as possible, but no simpler.” A rejoinder to scruggs and mastropieri. Behaviour Research and Therapy, 32(8), 885–890.
Bonett, D. G., & Price, R. M. (2020a). Confidence intervals for ratios of means and medians. Journal of Educational and Behavioral Statistics, 45(6), 750–770.
Bonett, D. G., & Price, R. M. (2020b). Interval estimation for linear functions of medians in within-subjects and mixed designs. British Journal of Mathematical and Statistical Psychology, 73(2), 333–346.
Busk, P. L., & Serlin, R. C. (1992). Meta-analysis for single-case research. In Single-case research design and analysis (psychology revivals) (pp. 199–224). Routledge.
Dunlap, W. P., & Silver, N. C. (1986). Confidence intervals and standard errors for ratios of normal variables. Behavior Research Methods, Instruments, & Computers, 18(5), 469–471. https://doi.org/10.3758/BF03201412
Ferron, J., Goldstein, H., Olszewski, A., & Rohrer, L. (2020). Indexing effects in single-case experimental designs by estimating the percent of goal obtained. Evidence-Based Communication Assessment and Intervention, 14(1), 6–27. https://doi.org/10.1080/17489539.2020.1732024
Gingerich, W. J. (1984). Meta-analysis of applied time-series data. The Journal of Applied Behavioral Science, 20(1), 71–79.
Grissom, R. J., & Kim, J. J. (2001). Review of assumptions and problems in the appropriate conceptualization of effect size. Psychological Methods, 6(2), 135–146. https://doi.org/10.1037/1082-989X.6.2.135
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29–36. https://doi.org/10.1148/radiology.143.1.7063747
Hedges, L. V. (1981). Distribution theory for glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6(2), 107–128.
Kendall, M. G. (1970). Rank correlation methods 4th edition. Griffin.
Ma, H.-H. (2006). An alternative method for quantitative synthesis of single-subject researches: Percentage of data points exceeding the median. Behavior Modification, 30(5), 598–617.
Mee, R. W. (1990). Confidence intervals for probabilities and tolerance regions based on a generalization of the mann-whitney statistic. Journal of the American Statistical Association, 85(411), 793–800.
Newcombe, R. G. (2006). Confidence intervals for an effect size measure based on the mann–whitney statistic. Part 2: Asymptotic methods and evaluation. Statistics in Medicine, 25(4), 559–573.
Parker, R. I., & Vannest, K. (2009). An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy, 40(4), 357–367.
Parker, R. I., Vannest, K. J., & Brown, L. (2009). The improvement rate difference for single-case research. Exceptional Children, 75(2), 135–150.
Parker, R. I., Vannest, K. J., & Davis, J. L. (2011). Effect size in single-case research: A review of nine nonoverlap techniques. Behavior Modification, 35(4), 303–322.
Parker, R. I., Vannest, K. J., Davis, J. L., & Sauber, S. B. (2011). Combining nonoverlap and trend for single-case research: Tau-u. Behavior Therapy, 42(2), 284–299.
Patrona, E., Ferron, J., Olszewski, A., Kelley, E., & Goldstein, H. (2022). Effects of explicit vocabulary interventions for preschoolers: An exploratory application of the Percent of Goal Obtained (PoGO) effect size metric. Journal of Speech, Language, and Hearing Research, 65(12), 4821–4836. https://doi.org/10.1044/2022_JSLHR-22-00217
Pustejovsky, J. E. (2015). Measurement-comparable effect sizes for single-case studies of free-operant behavior. Psychological Methods, 20(3), 342–359.
Pustejovsky, J. E. (2018). Using response ratios for meta-analyzing single-case designs with behavioral outcomes. Journal of School Psychology, 68, 99–112.
Pustejovsky, J. E. (2019). Procedural sensitivities of effect sizes for single-case designs with dbehavioral outcome measures. Psychological Methods, 24(2), 217–235.
Scruggs, T. E., Mastropieri, M. A., & Casto, G. (1987). The quantitative synthesis of single-subject research: Methodology and validation. Remedial and Special Education, 8(2), 24–33.
Sen, P. K. (1967). A note on asymptotically distribution-free confidence bounds for p {\{x< y}\}, based on two independent samples. Sankhyā: The Indian Journal of Statistics, Series A, 95–102.
Swan, D. M., & Pustejovsky, J. E. (2018). A Gradual Effects Model for Single-Case Designs. Multivariate Behavioral Research. https://doi.org/10.1080/00273171.2018.1466681
Tarlow, K. R. (2017). An improved rank correlation effect size statistic for single-case designs: Baseline corrected tau. Behavior Modification, 41(4), 427–467.
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the "CL" common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101–132. https://doi.org/10.2307/1165329