geex
In the basic set-up, M-estimation applies to estimators of the p × 1 parameter θ = (θ1, θ2, …, θp)⊺ which can be obtained as solutions to an equation of the form
where O1, …, Om are m independent and identically distributed (iid) random variables, and the function ψ returns a vector of length p and does not depend on i or m (Stefanski and Boos 2002, hereafter SB). See SB for the case where the Oi are independent but not necessarily identically distributed. The roots of (1) are referred to as M-estimators and denoted by θ̂. M-estimators can be solved for analytically in some cases or computed numerically in general. Under certain regularity conditions, the asymptotic properties of θ̂ can be derived from Taylor series approximations, the law of large numbers, and the central limit theorem. In brief, let θ0 be the true parameter value defined by ∫ψ(o, θ0)dF(o) = 0, where F is the distribution function of O. Let ψ̇(o, θ) = ∂ψ(Oi, θ)/∂θ⊺, A(θ0) = E[−ψ̇(O1, θ0)], and B(θ0) = E[ψ(O1, θ0)ψ(O1, θ0)⊺]. Then under suitable regularity assumptions, θ̂ is consistent and asymptotically Normal, i.e.,
where V(θ0) = A(θ0)−1B(θ0){A(θ0)−1}⊺. The sandwich form of V(θ0) suggests several possible large sample variance estimators. For some problems, the analytic form of V(θ0) can be derived and estimators of θ0 and other unknowns simply plugged into V(θ0). Alternatively, V(θ0) can be consistently estimated by the empirical sandwich variance estimator, where the expectations in A(θ) and B(θ) are replaced with their empirical counterparts. Let Ai = −ψ̇(Oi, θ)|θ = θ̂, $A_m = m^{-1} \sum_{i = 1}^m A_i$, Bi = ψ(Oi, θ̂)ψ(Oi, θ̂)⊺, and $B_m = m^{-1} \sum_{i = 1}^m B_i$. The empirical sandwich estimator of the variance of θ̂ is:
The geex
package provides an application programming
interface (API) for carrying out M-estimation. The analyst provides a
function, called estFUN
, corresponding to ψ(Oi, θ)
that maps data Oi to a function
of θ. Numerical derivatives
approximate ψ̇ so that
evaluating Σ̂ is entirely a
computational exercise. No analytic derivations are required from the
analyst.
Consider estimating the population mean θ = E[Yi] using the sample mean $\hat{\theta} = m^{-1} \sum_{i=1}^m Y_i$ of m iid random variables Y1, …, Ym. The estimator θ̂ can be expressed as the solution to the following estimating equation:
$$ \sum_{i = 1}^m (Y_i - \theta) = 0. $$
which is equivalent to solving where Oi = Yi
and ψ(Oi, θ) = Yi − θ.
An estFUN
is a translation of ψ into a function whose first
argument is data
and returns a function whose first
argument is theta
. An estFUN
corresponding to
the estimating equation for the sample mean of Y is:
meanFUN <- function(data){ function(theta){ data$Y - theta } } .
If an estimator fits into the above framework, then the user need
only specify estFUN
. No other programming is required to
obtain point and variance estimates. The remaining sections provide
examples of translating ψ into
an estFUN
.
The three examples of M-estimation from SB are presented here for
demonstration. For these examples, the data are Oi = {Y1i, Y2i}
where Y1 ∼ N(5, 16)
and Y2 ∼ N(2, 1) for
m = 100 and are included in
the geexex
dataset.
The first example estimates the population mean (θ1) and variance (θ2) of Y1. The solution to the estimating equations below are the sample mean $\hat{\theta}_1 = m^{-1} \sum_{i = 1}^m Y_{1i}$ and sample variance $\hat{\theta}_2 = m^{-1} \sum_{i = 1}^m (Y_{1i} - \hat{\theta}_1)^2$.
$$ \psi(Y_{1i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \\ (Y_{1i} - \theta_1)^2 - \theta_2 \end{pmatrix} $$
SB1_estfun <- function(data){
Y1 <- data$Y1
function(theta){
c(Y1 - theta[1],
(Y1 - theta[1])^2 - theta[2])
}
}
The primary geex
function is m_estimate
,
which requires two inputs: estFUN
(the ψ function), data
(the
data frame containing Oi for i = 1, …, m). The package
defaults to rootSolve::multiroot
or estimating roots of ,
though the solver algorithm can be specified in the
root_control
argument. Starting values for
rootSolve::multiroot
are passed via the
root_control
argument; e.g.,
library(geex)
results <- m_estimate(
estFUN = SB1_estfun,
data = geexex,
root_control = setup_root_control(start = c(1,1)))
n <- nrow(geexex)
A <- diag(1, nrow = 2)
B <- with(geexex, {
Ybar <- mean(Y1)
B11 <- mean( (Y1 - Ybar)^2 )
B12 <- mean( (Y1 - Ybar) * ((Y1 - Ybar)^2 - B11) )
B22 <- mean( ((Y1 - Ybar)^2 - B11)^2 )
matrix(
c(B11, B12,
B12, B22), nrow = 2
)
})
# closed form roots
theta_cls <- c(mean(geexex$Y1),
# since var() divides by n - 1, not n
var(geexex$Y1) * (n - 1)/ n )
# closed form sigma
Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n
comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)),
cls = list(estimates = theta_cls, vcov = Sigma_cls))
The m_estimate
function returns an object of the
S4
class geex
, which contains an
estimates
slot and vcov
slot for θ̂ and Σ̂, respectively. These slots can be
accessed by the functions coef
(or roots
) and
vcov
. The point estimates obtained for θ1 and θ2 are analogous to the
base R functions mean
and var
(after
multiplying by m − 1/m for the latter). SB
gave a closed form for A(θ0) (an
identity matrix) and B(θ0) (not shown
here) and suggest plugging in sample moments to compute B(θ̂). The point and
variance estimates using both geex
and the analytic
solutions are shown below}. The maximum absolute difference between
either the point or variance estimates is 4e-11, thus demonstrating
excellent agreement between the numerical results obtained from
geex
and the closed form solutions for this set of
estimating equations and data.
## $geex
## $geex$estimates
## [1] 5.044563 10.041239
##
## $geex$vcov
## [,1] [,2]
## [1,] 0.10041239 0.03667969
## [2,] 0.03667969 2.49219638
##
##
## $cls
## $cls$estimates
## [1] 5.044563 10.041239
##
## $cls$vcov
## [,1] [,2]
## [1,] 0.10041239 0.03667969
## [2,] 0.03667969 2.49219638
This example calculates a ratio estimator and illustrates the delta method via M-estimation. The estimating equations target the means of Y1 and Y2, labelled θ1 and θ2, as well as the estimand θ3 = θ1/θ2.
$$ \psi(Y_{1i}, Y_{2i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \\ Y_{2i} - \theta_2 \\ \theta_1 - \theta_3\theta_2 \end{pmatrix} $$
The solution to for this ψ function yields θ̂3 = Ȳ1/Ȳ2, where Ȳj denotes the sample mean of Yj1, …, Yjm for j = 1, 2.
SB2_estfun <- function(data){
Y1 <- data$Y1; Y2 <- data$Y2
function(theta){
c(Y1 - theta[1],
Y2 - theta[2],
theta[1] - (theta[3] * theta[2])
)
}
}
results <- m_estimate(
estFUN = SB2_estfun,
data = geexex,
root_control = setup_root_control(start = c(1, 1, 1)))
# Comparison to an analytically derived sanwich estimator
A <- with(geexex, {
matrix(
c(1 , 0, 0,
0 , 1, 0,
-1, mean(Y1)/mean(Y2), mean(Y2)),
byrow = TRUE, nrow = 3)
})
B <- with(geexex, {
matrix(
c(var(Y1) , cov(Y1, Y2), 0,
cov(Y1, Y2), var(Y2) , 0,
0, 0, 0),
byrow = TRUE, nrow = 3)
})
## closed form roots
theta_cls <- c(mean(geexex$Y1), mean(geexex$Y2))
theta_cls[3] <- theta_cls[1]/theta_cls[2]
## closed form covariance
Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / nrow(geexex)
comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)),
cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB gave closed form expressions for A(θ0) and B(θ0), into
which we plug in appropriate estimates for the matrix components and
compare to the results from geex
. The point estimates again
show excellent agreement (maximum absolute difference 4.4e-16), while
the covariance estimates differ by the third decimal (maximum absolute
difference 1e-03).
## $geex
## $geex$estimates
## [1] 5.044563 2.012793 2.506250
##
## $geex$vcov
## [,1] [,2] [,3]
## [1,] 0.100412389 -0.008718712 0.06074328
## [2,] -0.008718712 0.010693092 -0.01764626
## [3,] 0.060743283 -0.017646263 0.05215103
##
##
## $cls
## $cls$estimates
## [1] 5.044563 2.012793 2.506250
##
## $cls$vcov
## [,1] [,2] [,3]
## [1,] 0.10142666 -0.00880678 0.06135685
## [2,] -0.00880678 0.01080110 -0.01782451
## [3,] 0.06135685 -0.01782451 0.05267781
This example extends Example 1 to again illustrate the delta method. The estimating equations target not only the mean (θ1) and variance (θ2) of Y1, but also the standard deviation (θ3) and the log of the variance (θ4) of Y1.
$$ \psi(Y_{1i}, \mathbf{\theta}) = \begin{pmatrix} Y_{1i} - \theta_1 \\ (Y_{1i} - \theta_1)^2 - \theta_2 \\ \sqrt{\theta_2} - \theta_3 \\ \log(\theta_2) - \theta_4 \end{pmatrix} $$
SB3_estfun <- function(data){
Y1 <- data$Y1
function(theta){
c(Y1 - theta[1],
(Y1 - theta[1])^2 - theta[2],
sqrt(theta[2]) - theta[3],
log(theta[2]) - theta[4])
}
}
## closed form roots
theta_cls <- numeric(4)
theta_cls[1] <- mean(geexex$Y1)
theta_cls[2] <- sum((geexex$Y1 - theta_cls[1])^2)/nrow(geexex)
theta_cls[3] <- sqrt(theta_cls[2])
theta_cls[4] <- log(theta_cls[2])
## Compare to closed form ##
theta2 <- theta_cls[2]
mu3 <- moments::moment(geexex$Y1, order = 3, central = TRUE)
mu4 <- moments::moment(geexex$Y1, order = 4, central = TRUE)
## closed form covariance
Sigma_cls <- matrix(
c(theta2, mu3, mu3/(2*sqrt(theta2)), mu3/theta2,
mu3, mu4 - theta2^2, (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/theta2,
mu3/(2 * sqrt(theta2)), (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/(4*theta2), (mu4 - theta2^2)/(2*theta2^(3/2)),
mu3/theta2, (mu4 - theta2^2)/theta2, (mu4 - theta2^2)/(2*theta2^(3/2)), (mu4/theta2^2) - 1) ,
nrow = 4, byrow = TRUE) / nrow(geexex)
## closed form covariance
# Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n
comparison <- list(geex = list(estimates = coef(results), vcov = vcov(results)),
cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB again provided a closed form for A(θ0) and B(θ0), which we
compare to the geex
results. The maximum absolute
difference between geex
and the closed form estimates for
both the parameters and the covariance is 3.8e-11.
## $geex
## $geex$estimates
## [1] 5.044563 10.041239 3.168791 2.306700
##
## $geex$vcov
## [,1] [,2] [,3] [,4]
## [1,] 0.100412389 0.03667969 0.005787646 0.003652905
## [2,] 0.036679687 2.49219638 0.393240840 0.248196105
## [3,] 0.005787646 0.39324084 0.062049026 0.039162582
## [4,] 0.003652905 0.24819611 0.039162582 0.024717678
##
##
## $cls
## $cls$estimates
## [1] 5.044563 10.041239 3.168791 2.306700
##
## $cls$vcov
## [,1] [,2] [,3] [,4]
## [1,] 0.100412389 0.03667969 0.005787646 0.003652905
## [2,] 0.036679687 2.49219638 0.393240840 0.248196105
## [3,] 0.005787646 0.39324084 0.062049026 0.039162582
## [4,] 0.003652905 0.24819611 0.039162582 0.024717678