Mahalanobis Distance Multiple Imputation in SOLAS 4.0


SOLAS 4.0
was developed with guidance from Prof. Donald B. Rubin, the inventor of multiple imputation.

In general statistical analysis, the Mahalanobis Distance is a metric that can be used to measure the similarity/dissimilarity between two vectors. In SOLAS 4.0, the Mahalanobis distance is used to identify cases that have similar characteristics to cases that have missing values. Missing data are filled in by sampling from the closest cases. The multiple imputations are independent repetitions drawn from the range of closest cases.

For each case containing a missing value, the Mahalanobis Distance DM between that case and all other cases within the dataset, (or group) is calculated. The distance is calculated using covariates specified where; y is the vector of the covariates for the case with the missing value, xi is the vector for the ith fully observed case in the dataset and S is the covariance matrix for the set of covariates being used in the calculation of the Mahalanobis distance.

Each missing value from the imputation variable y is imputed by values randomly drawn from a subset of observed values, i.e. its donor pool, with the shortest Mahalanobis distance to the missing data entry that is to be imputed. The Donor Pool defines a set of cases with observed values for that imputation variable. The Donor Pool page gives the user control over the random draw step in the analysis. You are able to define the sub-set ranges and refine these ranges further using another variable known as the Refinement Variable.

Watch our short video on how easy it is to specify and run Mahalanobis Distance Multiple Imputation in SOLAS 4.0.

Visit our website to:

Contact us for your individual quotation.

41