mbes.Rd
mbes is used for model based estimation of population means using auxiliary variables. Difference, ratio and regression estimates are available.
mbes(formula, data, aux, N = Inf, method = 'all', level = 0.95, ...)
formula | object of class |
---|---|
data | data frame containing variables in the model |
aux | known mean of auxiliary variable, which provides secondary information |
N | positive integer for population size. Default is |
method | estimation method. Options are |
level | coverage probability for confidence intervals. Default is |
... | further options for linear regression model |
The option method='simple'
calculates the simple sample estimation without using the auxiliary variable.
The option method='diff'
calculates the difference estimate, method='ratio'
the ratio estimate, and method='regr'
the regression estimate which is based on the selected model. The option method='all'
calculates the simple and all model based estimates.
For methods 'diff'
, 'ratio'
and 'all'
the formula has to be y~x
with y
primary and x
secondary information.
For method 'regr'
, it is the symbolic description of the linear regression model. In this case, it can be used more than one auxiliary variable. Thus, aux
has to be a vector of the same length as the number of auxiliary variables in order as specified in the formula.
The function mbes
returns an object, which is a list consisting of the components
is a list of call components: formula
formula, data
data frame, aux
given value for mean of auxiliary variable, N
population size, type
type of model based estimation and level
coverage probability for confidence intervals
is a list of further information components: N
population size, n
sample size, p
number of auxiliary variables, aux
true mean of auxiliary variables in population and x.mean
sample means of auxiliary variables
is a list of result components, if method='simple'
or method='all'
is selected: mean
mean estimate of population mean for primary information, se
standard error of the mean estimate, and ci
vector of confidence interval boundaries
is a list of result components, if method='diff'
or method='all'
is selected: mean
mean estimate of population mean for primary information, se
standard error of the mean estimate, and ci
vector of confidence interval boundaries
is a list of result components, if method='ratio'
or method='all'
is selected: mean
mean estimate of population mean for primary information, se
standard error of the mean estimate, and ci
vector of confidence interval boundaries
is a list of result components, if type='regr'
or type='all'
is selected: mean
mean estimate of population mean for primary information, se
standard error of mean estimate, ci
vector of confidence interval boundaries, and model
underlying linear regression model
Kauermann, Goeran/Kuechenhoff, Helmut (2010): Stichproben. Methoden und praktische Umsetzung mit R. Springer.
Juliane Manitz
## 1) simple suppositious example data(pop) # Draw a random sample of size=3 set.seed(802016) data <- pop[sample(1:5, size=3),] names(data) <- c('id','x','y') # difference estimator mbes(formula=y~x, data=data, aux=15, N=5, method='diff', level=0.95)#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 5, sample size n = 3 #> #> Values for auxiliary variable: #> X.mean.1 = 15, x.mean.1 = 17.6667 #> ---------------------------------------------------------------- #> Difference Estimate #> #> Mean estimate: 14 #> Standard error: 0.7303 #> #> 95% confidence interval [12.5686,15.4314] #># ratio estimator mbes(formula=y~x, data=data, aux=15, N=5, method='ratio', level=0.95)#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 5, sample size n = 3 #> #> Values for auxiliary variable: #> X.mean.1 = 15, x.mean.1 = 17.6667 #> ---------------------------------------------------------------- #> Ratio Estimate #> #> Mean estimate: 14.1509 #> Standard error: 0.74 #> #> 95% confidence interval [12.7006,15.6013] #># regression estimator mbes(formula=y~x, data=data, aux=15, N=5, method='regr', level=0.95)#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 5, sample size n = 3 #> #> Values for auxiliary variable: #> X.mean.1 = 15, x.mean.1 = 17.6667 #> ---------------------------------------------------------------- #> Linear Regression Estimate #> #> Mean estimate: 14 #> Standard error: 1.0328 #> #> 95% confidence interval [11.9758,16.0242] #> #> ---------------------------------------------------------------- #> Linear Regression Model: #> Call: #> lm(formula = formula, data = data) #> #> Residuals: #> 5 4 2 #> 2.000e+00 -2.000e+00 6.661e-16 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -1.0000 6.3340 -0.158 0.900 #> x 1.0000 0.3464 2.887 0.212 #> #> Residual standard error: 2.828 on 1 degrees of freedom #> Multiple R-squared: 0.8929, Adjusted R-squared: 0.7857 #> F-statistic: 8.333 on 1 and 1 DF, p-value: 0.2123 #>## 2) Bundestag election data(election) # draw sample of size n = 20 N <- nrow(election) set.seed(67396) sample <- election[sort(sample(1:N, size=20)),] # secondary information SPD in 2002 X.mean <- mean(election$SPD_02) # forecast proportion of SPD in election of 2005 mbes(SPD_05 ~ SPD_02, data=sample, aux=X.mean, N=N, method='all')#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 299, sample size n = 20 #> #> Values for auxiliary variable: #> X.mean.1 = 0.3861, x.mean.1 = 0.3515 #> ---------------------------------------------------------------- #> Simple Estimate #> #> Mean estimate: 0.3009 #> Standard error: 0.0119 #> #> 95% confidence interval [0.2775,0.3242] #> #> ---------------------------------------------------------------- #> Difference Estimate #> #> Mean estimate: 0.3355 #> Standard error: 0.0088 #> #> 95% confidence interval [0.3183,0.3526] #> #> ---------------------------------------------------------------- #> Ratio Estimate #> #> Mean estimate: 0.3305 #> Standard error: 0.0072 #> #> 95% confidence interval [0.3163,0.3447] #> #> ---------------------------------------------------------------- #> Linear Regression Estimate #> #> Mean estimate: 0.3223 #> Standard error: 0.0063 #> #> 95% confidence interval [0.31,0.3346] #> #> ---------------------------------------------------------------- #> Linear Regression Model: #> Call: #> lm(formula = formula, data = data) #> #> Residuals: #> Min 1Q Median 3Q Max #> -0.054727 -0.022938 -0.003066 0.027230 0.037138 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 0.08290 0.03137 2.643 0.0165 * #> SPD_02 0.62004 0.08729 7.103 1.28e-06 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 0.02908 on 18 degrees of freedom #> Multiple R-squared: 0.737, Adjusted R-squared: 0.7224 #> F-statistic: 50.45 on 1 and 18 DF, p-value: 1.277e-06 #>#> [1] 0.3426949# Use a second predictor variable X.mean2 <- c(mean(election$SPD_02),mean(election$GREEN_02)) # forecast proportion of SPD in election of 2005 with two predictors mbes(SPD_05 ~ SPD_02+GREEN_02, data=sample, aux=X.mean2, N=N, method= 'regr')#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 299, sample size n = 20 #> #> Values for auxiliary variable: #> X.mean.1 = 0.3861, x.mean.1 = 0.3515 #> X.mean.2 = 0.0848, x.mean.2 = 0.07 #> ---------------------------------------------------------------- #> Linear Regression Estimate #> #> Mean estimate: 0.3291 #> Standard error: 0.0051 #> #> 95% confidence interval [0.3191,0.3391] #> #> ---------------------------------------------------------------- #> Linear Regression Model: #> Call: #> lm(formula = formula, data = data) #> #> Residuals: #> Min 1Q Median 3Q Max #> -0.037753 -0.016922 -0.004229 0.016320 0.048000 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 0.04326 0.02843 1.521 0.14652 #> SPD_02 0.66001 0.07223 9.138 5.71e-08 *** #> GREEN_02 0.36537 0.11489 3.180 0.00547 ** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 0.0237 on 17 degrees of freedom #> Multiple R-squared: 0.8351, Adjusted R-squared: 0.8157 #> F-statistic: 43.06 on 2 and 17 DF, p-value: 2.217e-07 #>## 3) money sample data(money) mu.X <- mean(money$X) x <- money$X[which(!is.na(money$y))] y <- na.omit(money$y) # estimation mbes(y~x, aux=mu.X, N=13, method='all')#> Error in mbes(y ~ x, aux = mu.X, N = 13, method = "all"): Wrong input: Missing data or wrong input of data## 4) model based two-phase sampling with mbes() id <- 1:1000 x <- rep(c(1,0,1,0),times=c(10,90,70,830)) y <- rep(c(1,0,NA),times=c(15,85,900)) phase <- rep(c(2,1), times=c(100,900)) data <- data.frame(id,x,y,phase) # mean of x out of first phase mean.x <- mean(data$x) mean.x#> [1] 0.08N1 <- length(data$x) # calculation of estimation for y est.y <- mbes(y~x, data=data, aux=mean.x, N=N1, method='ratio') est.y#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 1000, sample size n = 100 #> #> Values for auxiliary variable: #> X.mean.1 = 0.08, x.mean.1 = 0.1 #> ---------------------------------------------------------------- #> Ratio Estimate #> #> Mean estimate: 0.12 #> Standard error: 0.0261 #> #> 95% confidence interval [0.06882,0.1712] #># correction of standard error with uncertaincy in first phase v.y <- var(data$y, na.rm=TRUE) se.y <- sqrt(est.y$ratio$se^2 + v.y/N1) se.y#> [1] 0.02847114# corrected confidence interval lower <- est.y$ratio$mean - qnorm(0.975)*se.y upper <- est.y$ratio$mean + qnorm(0.975)*se.y c(lower, upper)#> [1] 0.06419758 0.17580242