LoadingLoading...

Predictive Mean Matching Method

If Predictive Mean Matching Multiple Imputation is selected, then an ordinary least-squares regression method is applied to the continuous, integer, and ordinal imputation variables, and discriminant multiple imputation is applied to the nominal imputation variables.

The predictive information in a user-specified set of covariates is used to impute the missing values in the variables to be imputed. First, the Predictive Model is estimated from the observed data. There is an option to use either the estimated model or using this estimated model, draw new linear regression parameters randomly from their Bayesian posterior distribution. The randomly drawn values are used to generate the imputations, which include random deviations from the model’s predictions. Drawing the exact model from its posterior distribution ensures that the extra uncertainty about the unknown true model is reflected.

In the system, multiple regression estimates of parameters are obtained using the method of least squares. If you have declared a variable to be nominal, then you need design variables (or dummy variables) to use this variable as a predictor variable in a multiple linear regression. The system’s multiple regression allows for this possibility and will create design variables for you.

Generation of Imputations
Let Y be the variable to be imputed, and let X be the set of covariates. Let Yobs be the observed values in Y, and Ymis the missing values in Y. Let Xobs be the units corresponding to Yobs. The Linear Regression Based Method regresses Yobs on Xobs to obtain a prediction equation of the form: Y=a+bX . Predicted values are then estimated for all cases in the dataset, regardless of whether they have values missing or not. These predictions are then used to create donor pools.

Defining Donor Pools Based on Predicted Values
Using the options in the Donor Pool window, the cases of the data sets can be partitioned into c donor pools of respondents according to the assigned predicted values, where c=5 is the default value of c. This is done by sorting the cases of the data sets according to their assigned predicted values in ascending order.

The Donor Pool page gives the user more control over the random draw step in the analysis. You are able to set the sub-set ranges and refine these ranges further using another variable known as the Refinement Variable that is described below.

Three ways of defining the Donor Pool sub-classes are provided:

  • You can divide the sample into c equal sized subsets; the default will be 5. If the value of c results in not more than 1 case being available to the selection algorithm, c will decrement by 1 until such time as there is sufficient data. The final value of c used is included in the Imputation Report output described later in this manual.
  • You can use the subset of c cases that are closest with respect to propensity score. This option allows you to specify the number of cases that are to be included in the sub-class. The default c will be 10 and cannot be set to a value less than 2. If less than 2 cases are available, a value of 5 will be used for c.
  • You can use the subset of d% of the cases that are closest with respect to predicted value. This is the percentage of “closest” cases in the data set to be included in the sub-class. The default for d will be 10.00 and cannot be set to a value that will result in less than 2 cases being available. If less than 2 cases are available, a d value of 5 will be used.
129