# Pool estimates and variances obtained by analysing multiple synthetic datasets

`syntheticPool.Rd`

This function pools estimates and variances which have been obtained by analysing multiple synthetic imputations (e.g. created used gFormulaImpute) using the method developed by Raghunathan et al 2003.

## Arguments

- fits
Collection of model fits produced by a call of the form

`with(imps, lm(y~regime))`

where`imps`

is a collection of imputed datasets of class`mids`

.

## Details

The only argument to `syntheticPool`

is a set of model fits obtained by running
an analysis on an imputed dataset collection of class `mids`

, as created
for example using the `mice`

function in the `mice`

package.

The function returns a table containing the overall parameter estimates, the within, between and total imputation variances, 95% confidence intervals, and p-values testing the null hypothesis that the corresponding parameters equal zero.

It is possible for the variance estimator developed by Raghunathan et al 2003 to
be negative. In this case `syntheticPool`

stops and informs you to re-impute
using a larger number of imputations `M`

and/or `nSim`

.

The development of the `gFormulaMI`

package was supported by a grant from the UK
Medical Research Council (MR/T023953/1).

## References

Raghunathan TE, Reiter JP, Rubin DB. 2003. Multiple imputation for statistical disclosure limitation. Journal of Official Statistics, 19(1), p.1-16.

## Author

Jonathan Bartlett jonathan.bartlett1@lshtm.ac.uk

## Examples

```
set.seed(7626)
#impute synthetic datasets under two regimes of interest using gFormulaImpute
imps <- gFormulaImpute(data=simDataFullyObs,M=10,
trtVars=c("a0","a1","a2"),
trtRegimes=list(c(0,0,0),c(1,1,1)))
#> [1] "Input data is a regular data frame."
#> [1] "Variables imputed using:"
#> l0 a0 l1 a1 l2 a2 y regime
#> "norm" "" "norm" "" "norm" "" "norm" ""
#> [1] "Predictor matrix is set to:"
#> l0 a0 l1 a1 l2 a2 y regime
#> l0 0 0 0 0 0 0 0 0
#> a0 1 0 0 0 0 0 0 0
#> l1 1 1 0 0 0 0 0 0
#> a1 1 1 1 0 0 0 0 0
#> l2 1 1 1 1 0 0 0 0
#> a2 1 1 1 1 1 0 0 0
#> y 1 1 1 1 1 1 0 0
#> regime 1 1 1 1 1 1 1 0
#fit linear model to final outcome with regime as covariate
fits <- with(imps, lm(y~factor(regime)))
#pool results using Raghunathan et al 2003 rules
syntheticPool(fits)
#> Estimate Within Between Total df
#> (Intercept) -0.02071539 0.0008125695 0.001763004 0.001126735 3.038045
#> factor(regime)2 2.96593502 0.0016251389 0.004203106 0.002998278 3.784951
#> 95% CI L 95% CI U p
#> (Intercept) -0.1267874 0.08535658 5.803156e-01
#> factor(regime)2 2.8104386 3.12143146 1.304839e-06
```