LoadingLoading...

Propensity Score Multiple Imputation

This data set contains the following 11 variables measured for 50 patients in a clinical trial:

  • OBS – Observation number.
  • SYMPDUR – Duration of symptoms.
  • AGE – The patient’s age.
  • MeasA_0, MeasA_1, MeasA_2, and MeasA_3. The baseline measurement for the response variable MeasA and three post-baseline measurements taken at month 1, month 2, and month 3.
  • MeasB_0, MeasB_1, MeasB_2, and MeasB_3. The baseline measurement for the response variable MeasB and three post-baseline measurements taken at month 1, month 2, and month 3.
  • The variables OBS, SYMPDUR, AGE, MeasA_0, and MeasB_0 are all fully observed, and the remaining 6 variables contain missing values. To view the missing pattern for this data set, do the following:

1. From the datasheet window, select View and Missing Data Pattern… In the Specify Missing Data Pattern window, press the Use All button.
2. From the View menu of the Missing Data Pattern window, select View Monotone Pattern to display the window shown on the right.

Note that after sorting the data into a Monotone pattern, the time structure of the longitudinal measures is preserved, so the missing data pattern in this data set is Monotone over time.

3. To close the Missing Data Pattern window, select File and Close.

We will now multiply impute all of the missing values in the data set using the Propensity Score Based Method.

1. From the Analyze menu, select Multiple Imputation and Propensity Score Method.
2. The Specify Propensity Method window is displayed and is a tabbed (paged) window. The window opens with two pages or tabs: Base Setup and Advanced Options. As soon as you select a variable to be imputed, a Non-Monotone tab, a Monotone tab, and a Donor Pool tab are also displayed.

Base Setup

Selecting the Base Setup tab allows you specify which variables you want to impute, and which variables you want to use as covariates for the logistic regression used to model the missingness.

1. Drag-and-drop the variables MeasA_1, MeasA_2, MeasA_3, MeasB_1, MeasB_2, MeasB_3 into the Variables to Impute field.
2. Drag and drop the variables SYMPDUR, AGE, MeasA_0, and MeasB_0 into the Fixed Covariates field.
3. As there is no Grouping variable in this data set, we can leave this field blank.

 
Non Monotone

Selecting the Non-monotone tab allows you to add or remove covariates from the logistic model used for imputing the non-monotone missing values in the data set. (These can be identified in the Missing Data Pattern mentioned earlier in the Predictive Model example.)

You select the + or - signs to expand or contract the list of covariates for each imputation variable.

The list of covariates for each imputation variable will be made up of the variables specified as Fixed Covariates in the Base Setup tab, and all of the other imputation variables. Variables can be added and removed from this list of covariates by simply dragging and dropping the variable from the covariate list to the variables field, or vice versa. Even though a variable appears in the list of covariates for a particular imputation variable, it might not be used in the final model.

The program first sorts the variables so that the missing data pattern is as close as possible to monotone, and then, for each missing value in the imputation variable, the program works out which variables, from the total list of covariates, can be used for prediction.

By default, all of the covariates are forced into the model. If you uncheck a covariate, it will not be forced into the model, but will be retained as a possible covariate in the stepwise selection. Details of the models that were actually used to impute the missing values are included in the Output log that can be selected from the View menu of the Multiply-Imputed Data Pages. These data pages will be displayed after you have specified the imputation and pressed the OK button in the Specify Predictive Model window.

 
Monotone

Selecting the Monotone tab allows you to add or remove covariates from the logistic model used for imputing the monotone missing values in the data set. (These can be identified in the Missing Data Pattern mentioned earlier.)

Again, you select the + or - signs to expand or contract the list of covariates for each imputation variable.

The list of covariates for each imputation variable will be made up of the variables specified as Fixed Covariates in the Base Setup tab, and all of the other imputation variables. Variables can be added and removed from this list by simply dragging and dropping the variable from the list of covariates, to the variables field, or vice versa. Even though a variable appears in the list of covariates for a particular imputation variable, it might not be used in the final model.

The program first sorts the variables so that the missing data pattern is as close as possible to monotone, and then uses only the variables that are to the left of the imputation variable as covariates. Details of the models that were actually used to impute the missing values are included in the Output Log.

 
Donor Pool

Selecting the Donor pool tab displays the Donor Pool page that allows more control over the random draw step in the analysis by allowing the user to define Propensity Score sub-classes.

The following options for defining the Propensity Score sub-classes are provided:

  • Divide propensity score into c subsets. The default is 5.
  • Use c closest cases. This option allows you to specify the number of cases before and after the case being imputed that are to be included in the subset.
  • Use d% of the data set closest cases. This option allows you to specify the number of cases as a percentage.

You can use one Refinement Variable for each of the variables being imputed. Variables can be dragged from the Variables listbox to the Refinement Variable column. When you use a refinement variable, the program reduces the subset of cases included in the donor pool to include only cases that are close with respect to their values of the refinement variable.

You can also specify the number of refinement variable cases to be used in the donor pool. For this example, we will use all of the default settings in this tab.

 
Advanced Options

Selecting the Advanced Options tab displays the Advanced Options window that allows the user to control the settings for the imputation and the logistic regression.

Randomization
Main Seed Value
The Main Seed Value is used to perform the random selection within the propensity subsets. The default seed is 12345. If you set this field to blank, or set it to zero, then the clock time is used.

Output Log
The Output Log is a comprehensive list of regression equations etc. that have been calculated for the imputed variable(s).

Least Squares Regression
Tolerance

The value set in the Tolerance datafield controls numerical accuracy. The tolerance limit is used for matrix inversion to guard against singularity. No independent variable is used whose R2 with other independent variables exceeds (1-Tolerance). You can adjust the tolerance using the scrolled datafield.

Stepping Criteria
Here you can select F-to-Enter and F-to-Remove values from the scrolled datafields, or enter your chosen value. If you wish to see more variables entered in the model, set the F-to-Enter value to a smaller value. The numerical value of F-to-remove should be chosen to be less than the F-to-Enter value.

Logistic Regression Options
The Logistic Regression options are as follows:

Model Tolerance
Controls the numerical accuracy. Computations are performed in double precision. Use a value that is greater than .000001 but less than 1.0. The default is .0001.
Tail area probabilities to control entry or removal of terms from the model
Specifies the limits for the tail area probabilities (p-values) for the appropriate _2 and F values used to control the entry and removal of terms.

Entry
During forward stepping, the term with the smallest p-value less than the entry value is entered first. If no term in the model has a p-value less than this limit, then the term with the largest p-value greater than the removal value is removed.

Removal
During backward stepping, the term with the largest p-value greater than the removal value is removed first. Then any terms with entry p-values less than the entry limit are entered. Again, for the purposes of this example, we will run the analysis with the default settings.

Maximum Likelihood Criteria
Maximum iterations to convergence

Specifies the maximum number of iterations to maximize the likelihood function. The default is 10.

Likelihood function convergence criterion
Specifies the convergence criterion for the likelihood function. A relative improvement less than this value is considered no improvement. The default is .00001.

Parameter estimates convergence criterion
Specifies the convergence criterion for the parameter estimates. The default is .0001.

When you are satisfied that you have specified your analysis correctly, click the OK button. The multiply-imputed datapages will be displayed, with the imputed values appearing in Red or Blue.

Return to SOLAS Examples Page

136