Optimal Caliper Width for Propensity Score Matching of Three Treatment Groups: A Monte Carlo Study

TLDR

Propensity score matching reduces bias in observational studies, yet it is largely limited to two treatment groups because key issues such as matching distance, baseline balance assessment, and optimal caliper width remain unresolved for multiple groups. The study aims to compare propensity score matching methods using different calipers and identify the optimal caliper width for analyses involving three treatment groups. The authors evaluated caliper widths from 0.1 to 0.8 of the pooled standard deviation of the logit of the propensity score in 0.1 increments, assessed baseline balance with standardized differences, and reported matching ratios, relative bias, and mean squared error across matched samples. Monte Carlo simulations demonstrate that a caliper width of 0.2 SD of the logit of the propensity score provides superior performance in estimating treatment effects, offering practical guidance for three‑group propensity score matching.

Abstract

Propensity score matching is a method to reduce bias in non-randomized and observational studies. Propensity score matching is mainly applied to two treatment groups rather than multiple treatment groups, because some key issues affecting its application to multiple treatment groups remain unsolved, such as the matching distance, the assessment of balance in baseline variables, and the choice of optimal caliper width. The primary objective of this study was to compare propensity score matching methods using different calipers and to choose the optimal caliper width for use with three treatment groups. The authors used caliper widths from 0.1 to 0.8 of the pooled standard deviation of the logit of the propensity score, in increments of 0.1. The balance in baseline variables was assessed by standardized difference. The matching ratio, relative bias, and mean squared error (MSE) of the estimate between groups in different propensity score-matched samples were also reported. The results of Monte Carlo simulations indicate that matching using a caliper width of 0.2 of the pooled standard deviation of the logit of the propensity score affords superior performance in the estimation of treatment effects. This study provides practical solutions for the application of propensity score matching of three treatment groups.

References

Page 1

	Year	Citations

Page 1