The smd
package
provides the smd
method to compute standardized mean
differences between two groups for continuous values
(numeric
and integer
data types) and
categorical values (factor
, character
, and
logical
). The method also works on matrix
,
list
, and data.frame
data types by applying
smd()
over the columns of the matrix
or
data.frame
and each item of the list
. The
package is based on Yang and Dalton
(2012).
The smd
function computes the standardized mean
difference for each level k of
a grouping variable compared to a reference r level:
$$ d_k = \sqrt{(\bar{x}_r - \bar{x}_{k})^{\intercal}S_{rk}^{-1}(\bar{x}_r - \bar{x}_{k})} $$
where x̄⋅ and Srk are the sample mean and covariances for reference group r and group k, respectively. In the case that x is categorical, x̄ is the vector of proportions of each category level within a group, and Srk is the multinomial covariance matrix.
Standard errors are computed using the formula described in Hedges and Olkin (1985):
$$ \sqrt{ \frac{n_r + n_k}{n_rn_k} + \frac{d_k^2}{2(n_r + n_k)} } $$
set.seed(123)
xn <- rnorm(90)
gg2 <- rep(LETTERS[1:2], each = 45)
gg3 <- rep(LETTERS[1:3], each = 30)
smd(x = xn, g = gg2)
#> term estimate
#> 1 B 0.03413269
smd(x = xn, g = gg3)
#> term estimate
#> 1 B -0.25169577
#> 2 C -0.07846864
smd(x = xn, g = gg2, std.error = TRUE)
#> term estimate std.error
#> 1 B 0.03413269 0.2108339
smd(x = xn, g = gg3, std.error = TRUE)
#> term estimate std.error
#> 1 B -0.25169577 0.2592192
#> 2 C -0.07846864 0.2582982
smd
with dplyr
library(dplyr, verbose = FALSE)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df$g <- gg2
df %>%
summarize_at(
.vars = vars(dplyr::matches("^x")),
.funs = list(smd = ~ smd(., g = g)$estimate)
)
#> xn_smd xi_smd xc_smd xf_smd xl_smd
#> 1 0.03413269 0.1687339 0.1946887 0.1946887 0