Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Abstract Background Core outcomes sets are increasingly used to define research outcomes that are most important for a condition. Different consensus methods are used in the development of core outcomes sets; the most common is the Delphi process. Delphi methodology is increasingly standardised for core outcomes set development, but uncertainties remain. We aimed to empirically test how the use of different summary statistics and consensus criteria impact Delphi process results. Methods Results from two unrelated child health Delphi processes were analysed. Outcomes were ranked by mean, median, or rate of exceedance, and then pairwise comparisons were undertaken to analyse whether the rankings were similar. The correlation coefficient for each comparison was calculated, and Bland-Altman plots produced. Youden’s index was used to assess how well the outcomes ranked highest by each summary statistic matched the final core outcomes sets. Consensus criteria identified in a review of published Delphi processes were applied to the results of the two child-health Delphi processes. The size of the consensus sets produced by different criteria was compared, and Youden’s index was used to assess how well the outcomes that met different criteria matched the final core outcomes sets. Results Pairwise comparisons of different summary statistics produced similar correlation coefficients. Bland–Altman plots showed that comparisons involving ranked medians had wider variation in the ranking. No difference in Youden’s index for the summary statistics was found. Different consensus criteria produced widely different sets of consensus outcomes (range: 5–44 included outcomes). They also showed differing abilities to identify core outcomes (Youden’s index range: 0.32–0.92). The choice of consensus criteria had a large impact on Delphi results. Discussion The use of different summary statistics is unlikely to affect how outcomes are ranked during a Delphi process: mean, median, and rates of exceedance produce similar results. Different consensus criteria have a large impact on resultant consensus outcomes and potentially on subsequent core outcomes sets: our results confirm the importance of adhering to pre-specified consensus criteria.

Original publication




Journal article




Springer Science and Business Media LLC

Publication Date