Pool estimates and variances obtained by analysing multiple synthetic datasets

This function pools estimates and variances which have been obtained by analysing multiple synthetic imputations (e.g. created used gFormulaImpute) using the method developed by Raghunathan et al 2003.

Usage

syntheticPool(fits)

Arguments

fits: Collection of model fits produced by a call of the form with(imps, lm(y~regime)) where imps is a collection of imputed datasets of class mids.

Value

A matrix containing the pooled results.

Details

The only argument to syntheticPool is a set of model fits obtained by running an analysis on an imputed dataset collection of class mids, as created for example using the mice function in the mice package.

The function returns a table containing the overall parameter estimates, the within, between and total imputation variances, 95% confidence intervals, and p-values testing the null hypothesis that the corresponding parameters equal zero.

It is possible for the variance estimator developed by Raghunathan et al 2003 to be negative. In this case syntheticPool stops and informs you to re-impute using a larger number of imputations M and/or nSim.

The development of the gFormulaMI package was supported by a grant from the UK Medical Research Council (MR/T023953/1).

References

Raghunathan TE, Reiter JP, Rubin DB. 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics, 19(1), p.1-16.

Author

Jonathan Bartlett jonathan.bartlett1@lshtm.ac.uk

Examples

set.seed(7626)
#impute synthetic datasets under two regimes of interest using gFormulaImpute
imps <- gFormulaImpute(data=simDataFullyObs,M=10,
                        trtVars=c("a0","a1","a2"),
                        trtRegimes=list(c(0,0,0),c(1,1,1)))
#> [1] "Input data is a regular data frame."
#> [1] "Variables imputed using:"
#>     l0     a0     l1     a1     l2     a2      y regime 
#> "norm"     "" "norm"     "" "norm"     "" "norm"     "" 
#> [1] "Predictor matrix is set to:"
#>        l0 a0 l1 a1 l2 a2 y regime
#> l0      0  0  0  0  0  0 0      0
#> a0      1  0  0  0  0  0 0      0
#> l1      1  1  0  0  0  0 0      0
#> a1      1  1  1  0  0  0 0      0
#> l2      1  1  1  1  0  0 0      0
#> a2      1  1  1  1  1  0 0      0
#> y       1  1  1  1  1  1 0      0
#> regime  1  1  1  1  1  1 1      0
#fit linear model to final outcome with regime as covariate
fits <- with(imps, lm(y~factor(regime)))
#pool results using Raghunathan et al 2003 rules
syntheticPool(fits)
#>                    Estimate       Within     Between       Total       df
#> (Intercept)     -0.02071539 0.0008125695 0.001763004 0.001126735 3.038045
#> factor(regime)2  2.96593502 0.0016251389 0.004203106 0.002998278 3.784951
#>                   95% CI L   95% CI U            p
#> (Intercept)     -0.1267874 0.08535658 5.803156e-01
#> factor(regime)2  2.8104386 3.12143146 1.304839e-06