--- title: "Using smd" author: "Bradley Saul" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Using smd} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} references: - id: yang2012unified title: A unified approach to measuring the effect size between two groups using SAS author: - family: Yang given: Dongsheng - family: Dalton given: Jarrod E volume: 335 URL: 'https://support.sas.com/resources/papers/proceedings12/335-2012.pdf' booktitle: SAS Global Forum page: 1--6 type: article-journal issued: year: 2012 - id: hedges1985 title: Statistical Methods for Meta-Analysis author: - family: Hedges given: LV - family: Olkin given: I type: book issued: year: 1985 editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The `smd` package provides the `smd` method to compute standardized mean differences between two groups for continuous values (`numeric` and `integer` data types) and categorical values (`factor`, `character`, and `logical`). The method also works on `matrix`, `list`, and `data.frame` data types by applying `smd()` over the columns of the `matrix` or `data.frame` and each item of the `list`. The package is based on @yang2012unified. The `smd` function computes the standardized mean difference for each level $k$ of a grouping variable compared to a reference $r$ level: \[ d_k = \sqrt{(\bar{x}_r - \bar{x}_{k})^{\intercal}S_{rk}^{-1}(\bar{x}_r - \bar{x}_{k})} \] where $\bar{x}_{\cdot}$ and $S_{rk}$ are the sample mean and covariances for reference group $r$ and group $k$, respectively. In the case that $x$ is categorical, $\bar{x}$ is the vector of proportions of each category level within a group, and $S_{rk}$ is the multinomial covariance matrix. Standard errors are computed using the formula described in @hedges1985: \[ \sqrt{ \frac{n_r + n_k}{n_rn_k} + \frac{d_k^2}{2(n_r + n_k)} } \] # Examples ```{r} library(smd) ``` ## Numeric ```{r numeric} set.seed(123) xn <- rnorm(90) gg2 <- rep(LETTERS[1:2], each = 45) gg3 <- rep(LETTERS[1:3], each = 30) smd(x = xn, g = gg2) smd(x = xn, g = gg3) smd(x = xn, g = gg2, std.error = TRUE) smd(x = xn, g = gg3, std.error = TRUE) ``` ## Integers ```{r integer} xi <- sample(1:20, 90, replace = TRUE) smd(x = xi, g = gg2) ``` ## Character ```{r character} xc <- unlist(replicate(2, sort(sample(letters[1:3], 45, replace = TRUE)), simplify = FALSE)) smd(x = xc, g = gg2) ``` ## Factors ```{r factor} xf <- factor(xc) smd(x = xf, g = gg2) ``` ## Logical ```{r logical} xl <- as.logical(rbinom(90, 1, prob = 0.5)) smd(x = xl, g = gg2) ``` ## Matrices ```{r matrix} mm <- cbind(xl, xl, xl, xl) smd(x = mm, g = gg3, std.error = FALSE) ``` ## Lists ```{r list} ll <- list(xn = xn, xi = xi, xf = xf, xl = xl) smd(x = ll, g = gg3) ``` ## data.frames ```{r data.frame} df <- data.frame(xn, xi, xc, xf, xl) smd(x = df, g = gg3) ``` ## Using `smd` with `dplyr` ```{r dplyr} library(dplyr, verbose = FALSE) df$g <- gg2 df %>% summarize_at( .vars = vars(dplyr::matches("^x")), .funs = list(smd = ~ smd(., g = g)$estimate) ) ``` # Other packages See: * [tableone](https://CRAN.R-project.org/package=tableone) * [stddiff](https://cran.r-project.org/package=stddiff) # References