mbes is used for model based estimation of population means using auxiliary variables. Difference, ratio and regression estimates are available.

mbes(formula, data, aux, N = Inf, method = 'all', level = 0.95, ...)

Arguments

formula

object of class formula (or one that can be coerced to that class): symbolic description for connection between primary and secondary information

data

data frame containing variables in the model

aux

known mean of auxiliary variable, which provides secondary information

N

positive integer for population size. Default is N=Inf, which means that calculations are carried out without finite population correction.

method

estimation method. Options are 'simple','diff','ratio','regr','all'. Default is method='all'.

level

coverage probability for confidence intervals. Default is level=0.95.

...

further options for linear regression model

Details

The option method='simple' calculates the simple sample estimation without using the auxiliary variable. The option method='diff' calculates the difference estimate, method='ratio' the ratio estimate, and method='regr' the regression estimate which is based on the selected model. The option method='all' calculates the simple and all model based estimates. For methods 'diff', 'ratio' and 'all' the formula has to be y~x with y primary and x secondary information. For method 'regr', it is the symbolic description of the linear regression model. In this case, it can be used more than one auxiliary variable. Thus, aux has to be a vector of the same length as the number of auxiliary variables in order as specified in the formula.

Value

The function mbes returns an object, which is a list consisting of the components

call

is a list of call components: formula formula, data data frame, aux given value for mean of auxiliary variable, N population size, type type of model based estimation and level coverage probability for confidence intervals

info

is a list of further information components: N population size, n sample size, p number of auxiliary variables, aux true mean of auxiliary variables in population and x.mean sample means of auxiliary variables

simple

is a list of result components, if method='simple' or method='all' is selected: mean mean estimate of population mean for primary information, se standard error of the mean estimate, and ci vector of confidence interval boundaries

diff

is a list of result components, if method='diff' or method='all' is selected: mean mean estimate of population mean for primary information, se standard error of the mean estimate, and ci vector of confidence interval boundaries

ratio

is a list of result components, if method='ratio' or method='all' is selected: mean mean estimate of population mean for primary information, se standard error of the mean estimate, and ci vector of confidence interval boundaries

regr

is a list of result components, if type='regr' or type='all' is selected: mean mean estimate of population mean for primary information, se standard error of mean estimate, ci vector of confidence interval boundaries, and model underlying linear regression model

References

Kauermann, Goeran/Kuechenhoff, Helmut (2010): Stichproben. Methoden und praktische Umsetzung mit R. Springer.

Author

Juliane Manitz

See also

Examples

## 1) simple suppositious example data(pop) # Draw a random sample of size=3 set.seed(802016) data <- pop[sample(1:5, size=3),] names(data) <- c('id','x','y') # difference estimator mbes(formula=y~x, data=data, aux=15, N=5, method='diff', level=0.95)
#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 5, sample size n = 3 #> #> Values for auxiliary variable: #> X.mean.1 = 15, x.mean.1 = 17.6667 #> ---------------------------------------------------------------- #> Difference Estimate #> #> Mean estimate: 14 #> Standard error: 0.7303 #> #> 95% confidence interval [12.5686,15.4314] #>
# ratio estimator mbes(formula=y~x, data=data, aux=15, N=5, method='ratio', level=0.95)
#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 5, sample size n = 3 #> #> Values for auxiliary variable: #> X.mean.1 = 15, x.mean.1 = 17.6667 #> ---------------------------------------------------------------- #> Ratio Estimate #> #> Mean estimate: 14.1509 #> Standard error: 0.74 #> #> 95% confidence interval [12.7006,15.6013] #>
# regression estimator mbes(formula=y~x, data=data, aux=15, N=5, method='regr', level=0.95)
#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 5, sample size n = 3 #> #> Values for auxiliary variable: #> X.mean.1 = 15, x.mean.1 = 17.6667 #> ---------------------------------------------------------------- #> Linear Regression Estimate #> #> Mean estimate: 14 #> Standard error: 1.0328 #> #> 95% confidence interval [11.9758,16.0242] #> #> ---------------------------------------------------------------- #> Linear Regression Model: #> Call: #> lm(formula = formula, data = data) #> #> Residuals: #> 5 4 2 #> 2.000e+00 -2.000e+00 6.661e-16 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) -1.0000 6.3340 -0.158 0.900 #> x 1.0000 0.3464 2.887 0.212 #> #> Residual standard error: 2.828 on 1 degrees of freedom #> Multiple R-squared: 0.8929, Adjusted R-squared: 0.7857 #> F-statistic: 8.333 on 1 and 1 DF, p-value: 0.2123 #>
## 2) Bundestag election data(election) # draw sample of size n = 20 N <- nrow(election) set.seed(67396) sample <- election[sort(sample(1:N, size=20)),] # secondary information SPD in 2002 X.mean <- mean(election$SPD_02) # forecast proportion of SPD in election of 2005 mbes(SPD_05 ~ SPD_02, data=sample, aux=X.mean, N=N, method='all')
#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 299, sample size n = 20 #> #> Values for auxiliary variable: #> X.mean.1 = 0.3861, x.mean.1 = 0.3515 #> ---------------------------------------------------------------- #> Simple Estimate #> #> Mean estimate: 0.3009 #> Standard error: 0.0119 #> #> 95% confidence interval [0.2775,0.3242] #> #> ---------------------------------------------------------------- #> Difference Estimate #> #> Mean estimate: 0.3355 #> Standard error: 0.0088 #> #> 95% confidence interval [0.3183,0.3526] #> #> ---------------------------------------------------------------- #> Ratio Estimate #> #> Mean estimate: 0.3305 #> Standard error: 0.0072 #> #> 95% confidence interval [0.3163,0.3447] #> #> ---------------------------------------------------------------- #> Linear Regression Estimate #> #> Mean estimate: 0.3223 #> Standard error: 0.0063 #> #> 95% confidence interval [0.31,0.3346] #> #> ---------------------------------------------------------------- #> Linear Regression Model: #> Call: #> lm(formula = formula, data = data) #> #> Residuals: #> Min 1Q Median 3Q Max #> -0.054727 -0.022938 -0.003066 0.027230 0.037138 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 0.08290 0.03137 2.643 0.0165 * #> SPD_02 0.62004 0.08729 7.103 1.28e-06 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 0.02908 on 18 degrees of freedom #> Multiple R-squared: 0.737, Adjusted R-squared: 0.7224 #> F-statistic: 50.45 on 1 and 18 DF, p-value: 1.277e-06 #>
# true value Y.mean <- mean(election$SPD_05) Y.mean
#> [1] 0.3426949
# Use a second predictor variable X.mean2 <- c(mean(election$SPD_02),mean(election$GREEN_02)) # forecast proportion of SPD in election of 2005 with two predictors mbes(SPD_05 ~ SPD_02+GREEN_02, data=sample, aux=X.mean2, N=N, method= 'regr')
#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 299, sample size n = 20 #> #> Values for auxiliary variable: #> X.mean.1 = 0.3861, x.mean.1 = 0.3515 #> X.mean.2 = 0.0848, x.mean.2 = 0.07 #> ---------------------------------------------------------------- #> Linear Regression Estimate #> #> Mean estimate: 0.3291 #> Standard error: 0.0051 #> #> 95% confidence interval [0.3191,0.3391] #> #> ---------------------------------------------------------------- #> Linear Regression Model: #> Call: #> lm(formula = formula, data = data) #> #> Residuals: #> Min 1Q Median 3Q Max #> -0.037753 -0.016922 -0.004229 0.016320 0.048000 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 0.04326 0.02843 1.521 0.14652 #> SPD_02 0.66001 0.07223 9.138 5.71e-08 *** #> GREEN_02 0.36537 0.11489 3.180 0.00547 ** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 0.0237 on 17 degrees of freedom #> Multiple R-squared: 0.8351, Adjusted R-squared: 0.8157 #> F-statistic: 43.06 on 2 and 17 DF, p-value: 2.217e-07 #>
## 3) money sample data(money) mu.X <- mean(money$X) x <- money$X[which(!is.na(money$y))] y <- na.omit(money$y) # estimation mbes(y~x, aux=mu.X, N=13, method='all')
#> Error in mbes(y ~ x, aux = mu.X, N = 13, method = "all"): Wrong input: Missing data or wrong input of data
## 4) model based two-phase sampling with mbes() id <- 1:1000 x <- rep(c(1,0,1,0),times=c(10,90,70,830)) y <- rep(c(1,0,NA),times=c(15,85,900)) phase <- rep(c(2,1), times=c(100,900)) data <- data.frame(id,x,y,phase) # mean of x out of first phase mean.x <- mean(data$x) mean.x
#> [1] 0.08
N1 <- length(data$x) # calculation of estimation for y est.y <- mbes(y~x, data=data, aux=mean.x, N=N1, method='ratio') est.y
#> #> mbes object: Model Based Estimation of Population Mean #> Population size N = 1000, sample size n = 100 #> #> Values for auxiliary variable: #> X.mean.1 = 0.08, x.mean.1 = 0.1 #> ---------------------------------------------------------------- #> Ratio Estimate #> #> Mean estimate: 0.12 #> Standard error: 0.0261 #> #> 95% confidence interval [0.06882,0.1712] #>
# correction of standard error with uncertaincy in first phase v.y <- var(data$y, na.rm=TRUE) se.y <- sqrt(est.y$ratio$se^2 + v.y/N1) se.y
#> [1] 0.02847114
# corrected confidence interval lower <- est.y$ratio$mean - qnorm(0.975)*se.y upper <- est.y$ratio$mean + qnorm(0.975)*se.y c(lower, upper)
#> [1] 0.06419758 0.17580242