Additional calculus of M-estimation examples using geex

Preliminaries

This vignette implements examples from and is meant to read with Stefanski and Boos (2002) (SB). Examples 4-9 use the geexex data set. For information on how the data are generated, run help('geexex').

library(geex)

The Basic Approach

Example 4: Instrumental variables

Example 4 calculates an instumental variable estimator. The variables (Y3, W1, Z1) are observed according to:

Y3i = α + βX1i + σϵϵ1, i

W1i = βX1i + σUϵ2, i

and

Z1i = γ + δX1i + στϵ3, i.

Y3 is a linear function of a latent variable X1, whose coefficient β is of interest. W1 is a measurement of the unobserved X1, and Z1 is an instrumental variable for X1. The instrumental variable estimator is β̂IV is the ratio of least squares regression slopes of Y3 on Z1 and Y3 and W1. The ψ function below includes this estimator as θ̂4 = β̂IV. In the example data, 100 observations are generated where X1 ∼ Γ(shape = 5, rate = 1), α = 2, β = 3, γ = 2, δ = 1.5, σϵ = στ = 1, σU = .25, and ϵ⋅, i ∼ N(0, 1).

$$ \psi(Y_{3i}, Z_{1i}, W_{1i}, \mathbf{\theta}) = \begin{pmatrix} \theta_1 - Z_{1i} \\ \theta_2 - W_{1i} \\ (Y_{3i} - \theta_3 W_{1i})(\theta_2 - W_{1i}) \\ (Y_{3i} - \theta_4 W_{1i})(\theta_1 - Z_{1i}) \\ \end{pmatrix} $$

SB4_estFUN <- function(data){
  Z1 <- data$Z1; W1 <- data$W1; Y3 <- data$Y3
  function(theta){
      c(theta[1] - Z1,
        theta[2] - W1,
        (Y3 - (theta[3] * W1)) * (theta[2] - W1),
        (Y3 - (theta[4] * W1)) * (theta[1] - Z1)
    )
  }
}
estimates <- m_estimate(
  estFUN = SB4_estFUN, 
  data  = geexex, 
  root_control = setup_root_control(start = c(1, 1, 1, 1)))

The R packages AER and ivpack can compute the IV estimator and sandwich variance estimators respectively, which is done below for comparison.

ivfit <- AER::ivreg(Y3 ~ W1 | Z1, data = geexex)
iv_se <- ivpack::cluster.robust.se(ivfit, clusterid = 1:nrow(geexex))
coef(ivfit)[2] 
coef(estimates)[4]
iv_se[2, 'Std. Error'] 
sqrt(vcov(estimates)[4, 4])

Connections to the Influence Curve

Example 5: Hodges-Lehmann estimator

This example shows the link between the influence curves and the Hodges-Lehman estimator.

$$ \psi(Y_{2i}, \theta) = \begin{pmatrix} IC_{\hat{\theta}_{HL}}(Y_2; \theta_0) - (\theta_1 - \theta_0) \\ \end{pmatrix} $$

The functions used in SB5_estFUN are defined below:

F0 <- function(y, theta0, distrFUN = pnorm){
  distrFUN(y - theta0, mean = 0)
}

f0 <- function(y, densFUN){
  densFUN(y, mean = 0)
}

integrand <- function(y, densFUN = dnorm){
  f0(y, densFUN = densFUN)^2
}

IC_denom <- integrate(integrand, lower = -Inf, upper = Inf)$value
SB5_estFUN <- function(data){
  Yi <- data[['Y2']]
  function(theta){

     (1/IC_denom) * (F0(Yi, theta[1]) - 0.5)
  }
}
estimates <- m_estimate(
  estFUN = SB5_estFUN, 
  data  = geexex,
  root_control = setup_root_control(start = 2))

The hc.loc of the ICSNP package computes the Hodges-Lehman estimator and SB gave closed form estimators for the variance.

theta_cls <- ICSNP::hl.loc(geexex$Y2)
Sigma_cls <- 1/(12 * IC_denom^2) / nrow(geexex)
## $geex
## $geex$parameters
## [1] 2.026376
## 
## $geex$vcov
##            [,1]
## [1,] 0.01040586
## 
## 
## $cls
## $cls$parameters
## [1] 2.024129
## 
## $cls$vcov
## [1] 0.01047198

Non-smooth ψ functions

Example 6: Huber center of symmetry estimator

This example illustrates estimation with nonsmooth ψ function for computing the Huber (1964) estimator of the center of symmetric distributions.

$$ \psi_k(Y_{1i}, \theta) = \begin{cases} (Y_{1i} - \theta) & \text{when } |(Y_{1i} - \theta)| \leq k \\ k * sgn(Y_{1i} - \theta) & \text{when } |(Y_{1i} - \theta)| > k\\ \end{cases} $$

SB6_estFUN <- function(data, k = 1.5){
  Y1 <- data$Y1
  function(theta){
    x <- Y1 - theta[1]
    if(abs(x) <= k) x else sign(x) * k
  }
}
estimates <- m_estimate(
  estFUN = SB6_estFUN, 
  data  = geexex,
  root_control = setup_root_control(start = 3))

The point estimate from geex is compared to the estimate obtained from the huber function in the MASS package. The variance estimate is compared to an estimator suggested by SB: Am = ∑i[−ψ′(Yi − θ̂)]/m and Bm = ∑iψk2(Yi − θ̂)/m, where ψk is the derivative of ψ where is exists.

theta_cls <- MASS::huber(geexex$Y1, k = 1.5, tol = 1e-10)$mu

psi_k <- function(x, k = 1.5){
  if(abs(x) <= k) x else sign(x) * k
}

A <- mean(unlist(lapply(geexex$Y1, function(y){
  x <- y - theta_cls
  -numDeriv::grad(psi_k, x = x)
})))

B <- mean(unlist(lapply(geexex$Y1, function(y){
  x <- y - theta_cls
  psi_k(x = x)^2
})))

## closed form covariance
Sigma_cls <- matrix(1/A * B * 1/A / nrow(geexex))
results <- list(geex = list(parameters = coef(estimates), vcov = vcov(estimates)), 
                cls = list(parameters = theta_cls, vcov = Sigma_cls))
results
## $geex
## $geex$parameters
## [1] 4.82061
## 
## $geex$vcov
##            [,1]
## [1,] 0.08356179
## 
## 
## $cls
## $cls$parameters
## [1] 4.999386
## 
## $cls$vcov
##           [,1]
## [1,] 0.0928935

Example 7: Sample quantiles (approximation of ψ)

Approximation of ψ with geex is EXPERIMENTAL.

Example 7 illustrates calculation of sample quantiles using M-estimation and approximation of the ψ function.

$$ \psi(Y_{1i}, \theta) = \begin{pmatrix} 0.5 - I(Y_{1i} \leq \theta_1) \\ \end{pmatrix} $$

SB7_estFUN <- function(data){
  Y1 <- data$Y1
  function(theta){
    0.5  - (Y1 <= theta[1])
  }
}

Note that ψ is not differentiable; however, geex includes the ability to approximate the ψ function via an approx_control object. The FUN in an approx_control must be a function that takes in the ψ function, modifies it, and returns a function of theta. For this example, I approximate ψ with a spline function. The eval_theta argument is used to modify the basis of the spline.

spline_approx <- function(psi, eval_theta){
  y <- Vectorize(psi)(eval_theta)
  f <- splinefun(x = eval_theta, y = y)
  function(theta) f(theta)
}
estimates <- m_estimate(
  estFUN = SB7_estFUN, 
  data  = geexex, 
  root_control = setup_root_control(start = 4.7),
  approx_control  = setup_approx_control(FUN = spline_approx,
                                      eval_theta = seq(3, 6, by = .05)))

A comparison of the variance is not obvious, so no comparison is made.

## $geex
## $geex$parameters
## [1] 4.7
## 
## $geex$vcov
##           [,1]
## [1,] 0.1773569
## 
## 
## $cls
## $cls$parameters
## [1] 4.708489
## 
## $cls$vcov
## [1] NA

Regression M-estimators

Example 8: Robust regression

Example 8 demonstrates robust regression for estimating β from 100 observations generated from Y4 = 0.1 + 0.1X1i + 0.5X2i + ϵi, where ϵi ∼ N(0, 1), X1 is defined as above, and the first half of the observation have X2i = 1 and the others have X2i = 0.

$$ \psi_k(Y_{4i}, \theta) = \begin{pmatrix} \psi_k(Y_{4i} - \mathbf{x}_i^T \beta) \mathbf{x}_i \end{pmatrix} $$

psi_k <- function(x, k = 1.345){
  if(abs(x) <= k) x else sign(x) * k
}

SB8_estFUN <- function(data){
  Yi <- data$Y4
  xi <- model.matrix(Y4 ~ X1 + X2, data = data)
  function(theta){
    r <- Yi - xi %*% theta
    c(psi_k(r) %*% xi)
  }
}
estimates <- m_estimate(
  estFUN = SB8_estFUN, 
  data  = geexex,
  root_control = setup_root_control(start = c(0, 0, 0)))
m <- MASS::rlm(Y4 ~ X1 + X2, data = geexex, method = 'M')
theta_cls <- coef(m)
Sigma_cls <- vcov(m)
results <- list(geex = list(parameters = coef(estimates), vcov = vcov(estimates)), 
                cls = list(parameters = theta_cls, vcov = Sigma_cls))
results
## $geex
## $geex$parameters
## [1] -0.04050369  0.14530196  0.30181589
## 
## $geex$vcov
##             [,1]          [,2]          [,3]
## [1,]  0.05871932 -0.0101399730 -0.0133230841
## [2,] -0.01013997  0.0021854268  0.0003386202
## [3,] -0.01332308  0.0003386202  0.0447117633
## 
## 
## $cls
## $cls$parameters
## (Intercept)          X1          X2 
##  -0.0377206   0.1441181   0.2988842 
## 
## $cls$vcov
##             (Intercept)            X1            X2
## (Intercept)  0.07309914 -0.0103060747 -0.0241724792
## X1          -0.01030607  0.0020579145  0.0005364106
## X2          -0.02417248  0.0005364106  0.0431120686

Example 9: Generalized linear models

Example 9 illustrates estimation of a generalized linear model.

$$ \psi(Y_i, \theta) = \begin{pmatrix} D_i(\beta)\frac{Y_i - \mu_i(\beta)}{V_i(\beta) \tau} \end{pmatrix} $$

SB9_estFUN <- function(data){
  Y <- data$Y5
  X <- model.matrix(Y5 ~ X1 + X2, data = data, drop = FALSE)
  function(theta){
    lp <- X %*% theta
    mu <- plogis(lp)
    D  <- t(X) %*% dlogis(lp)
    V  <- mu * (1 - mu)
    D %*% solve(V) %*% (Y - mu)
  }
}
estimates <- m_estimate(
  estFUN = SB9_estFUN, 
  data  = geexex, 
  root_control = setup_root_control(start = c(.1, .1, .5)))

Compare point estimates to glm coefficients and covariance matrix to sandwich.

m9 <- glm(Y5 ~ X1 + X2, data = geexex, family = binomial(link = 'logit'))
theta_cls <- coef(m9)
Sigma_cls <- sandwich::sandwich(m9)
results <- list(geex = list(parameters = coef(estimates), vcov = vcov(estimates)),  
                cls = list(parameters = theta_cls, vcov = Sigma_cls))
results
## $geex
## $geex$parameters
## [1] -1.1256071  0.3410619 -0.1148368
## 
## $geex$vcov
##             [,1]         [,2]         [,3]
## [1,]  0.35202094 -0.058906883 -0.101528787
## [2,] -0.05890688  0.012842435  0.004357355
## [3,] -0.10152879  0.004357355  0.185455144
## 
## 
## $cls
## $cls$parameters
## (Intercept)          X1          X2 
##  -1.1256070   0.3410619  -0.1148368 
## 
## $cls$vcov
##             (Intercept)           X1           X2
## (Intercept)  0.35201039 -0.058903546 -0.101534539
## X1          -0.05890355  0.012841392  0.004358926
## X2          -0.10153454  0.004358926  0.185456314

Application to test statistics

Example 10: Testing equality of success probabilities

Example 10 illustrates testing equality of success probablities.

$$ \psi(Y_i, n_i, \theta) = \begin{pmatrix} \frac{(Y_i - n_i \theta_2)^2}{n_i \theta_2( 1 - \theta_2 )} - \theta_1 \\ Y_i - n_i \theta_2 \end{pmatrix} $$

SB10_estFUN <- function(data){
  Y <- data$ft_made
  n <- data$ft_attp
  function(theta){
    p <- theta[2]
    c(((Y - (n * p))^2)/(n * p * (1 - p))  - theta[1], 
      Y - n * p)
  }
}
estimates <- m_estimate(
  estFUN = SB10_estFUN, 
  data  = shaq, 
  units = 'game', 
  root_control = setup_root_control(start = c(.5, .5)))
V11 <- function(p) {
  k    <- nrow(shaq)
  sumn <- sum(shaq$ft_attp)
  sumn_inv <- sum(1/shaq$ft_attp)
  term2_n  <- 1 - (6 * p) + (6 * p^2)
  term2_d <- p * (1 - p) 
  term2  <- term2_n/term2_d
  term3  <- ((1 - (2 * p))^2) / ((sumn/k) * p * (1 - p))
  2 + (term2 * (1/k) * sumn_inv) - term3
}

p_tilde <- sum(shaq$ft_made)/sum(shaq$ft_attp)
V11_hat <- V11(p_tilde)/23

# Compare variance estimates
V11_hat
## [1] 0.0783097
vcov(estimates)[1, 1]
## [1] 0.1929791
# Note the differences in the p-values
pnorm(35.51/23, mean  = 1, sd = sqrt(V11_hat), lower.tail = FALSE)
## [1] 0.02596785
pnorm(coef(estimates)[1], 
      mean = 1, 
      sd = sqrt(vcov(estimates)[1, 1]),
      lower.tail = FALSE)
## [1] 0.1078138

This example shows that the empircal sandwich variance estimator may be different from other sandwich variance estimators that make assumptions about the structure of the A and B matrices.

References

Huber, Peter J. 1964. Robust Estimation of a Location Parameter 35.
Stefanski, Len A., and Dennis D. Boos. 2002. The Calculus of M-Estimation 56. https://doi.org/10.1198/000313002753631330.