Introduction
Goldilocks designs achieve everything group sequential designs do, but may provide a number of additional advantages. Goldilocks designs are widely used both in “Learn” and “Confirm” studies, the approach is accepted by health authorities, and operating characteristics can be determined through simulation (FACTS).
A Goldilocks trial design is an adaptive clinical trial methodology developed to optimize the sample size dynamically during the course of a trial. Its name references the “just right” principle from the Goldilocks fairy tale—neither too large nor too small. Goldilocks designs seek balance between flexibility and efficiency, by repeatedly updating sample-size decisions based on accumulating trial data, through predictive probability or other Bayesian statistical methods.
Goldilocks designs allow for stopping the trial early by halting accrual at an interim analysis, but always allow subjects the opportunity to collect complete follow-up before the trial’s final analysis is performed. Goldilocks designs combine the completely observed patient data with the uncertainty associated with the missing data at the time of an interim analysis, and use predictive probabilities to decide if stopping for expected success or futility is appropriate.
This is particularly relevant if the primary endpoint involves a delayed observation. If an interim analysis declares early success, and accrual stops, but more data from patients already randomized come in, and the statistic drops below the threshold, can we claim that we have demonstrated efficacy? Goldilocks designs explicitly take this possibility into account and mitigate the risk of a “flip flop” (i.e. declaring “success” early at an interim analysis, based on incomplete data, only to see this flipping into “no success”) once all data (including delayed observations) become available.
Goldilocks Clinical Trial Designs
The Goldilocks trial design, as an adaptive trial framework, draws on the strengths of group sequential designs and additionally emphasizes flexibility, efficiency, and optimal use of data. The concept of the “Goldilocks” design comes from the idea of finding an optimal balance—where the trial is neither too large (wasting resources) nor too small (lacking statistical power), but just right for the research question at hand.
Key Features of Goldilocks Trial Designs
- Adaptive and Flexible: Goldilocks designs allow for multiple adaptive changes based on interim results, such as the addition or removal of treatment arms, changes to patient inclusion/exclusion criteria, or modifications to the statistical model. This flexibility allows the trial to be tailored in real-time to the evolving landscape of treatment options, participant response, and other external factors.
- Incorporation of Multiple Interim Analyses: Goldilocks trials can conduct interim analyses at flexible intervals. This enables real-time decision-making that maximizes the use of accumulated data, including incomplete data.
Key aspects of Goldilocks Designs
Bayesian and Predictive Probability Approaches
Goldilocks designs frequently utilize Bayesian methods, commonly including multiple imputation, and predictive probability calculations at interim analyses to assess the likelihood of trial success at any given point, guiding adaptive sample size decisions. Both Goldilocks designs and group sequential designs can have a frequentist final analysis.
Analytical Calculations of Stopping Boundaries
Group sequential designs have analytical calculations that provide stopping boundaries guaranteeing exact Type I error control. It is common to simulate the Type I error of a Goldilocks design to calculate stopping boundaries.
Stopping for futility
Goldilocks designs allow futility stopping criteria to be framed in terms of the predictive probability of success: What would the probability of success be, should the trial continue to recruit up to the full sample size and all follow-up information was collected? If this probability is smaller than some predefined threshold (say for instance less than 5%) the trial can stop for futility.
Example Applications and Technical Details
A foundational paper explicitly outlining the “Goldilocks” approach was authored by Broglio, Connor, and Berry (2014) (1). In their seminal work, they introduce a Bayesian predictive probability framework to guide sample size adaptations, clearly illustrating how interim results influence sample-size decisions: In this paper, Broglio et al. provide detailed simulations and comparisons illustrating how Goldilocks designs can outperform traditional designs, particularly when initial estimates of treatment effects are uncertain. They demonstrate that these designs can substantially reduce the number of patients exposed to ineffective treatments without sacrificing statistical rigor. Tables and figures in the paper present simulations showing how predictive probabilities shift dynamically with accumulating data, and how this influences ongoing sample size decisions. Goldilocks designs represent a significant advance in clinical trial methodology, providing a mechanism for dynamically adjusting sample size to achieve optimal efficiency and ethical balance. They differ from traditional group sequential designs through continuous or periodic Bayesian predictive adaptation, making them well-suited to modern clinical trial challenges such as uncertain effect sizes, limited patient populations, or time-critical situations like pandemics. By integrating real-time data monitoring with flexible decision-making, Goldilocks designs help ensure trials are appropriately powered without unnecessary resource expenditure or patient exposure to inferior treatments.
If you have more questions, please contact Dr Nick Berry (North America; Email) or Dr Elias Laurin Meyer (Europe; Email).
Key Papers:
- Broglio KR, Connor JT, Berry SM. Not too big, not too small: a Goldilocks approach to sample size selection. J Biopharm Stat. 2014;24(3):685–705. PubMed
- Berry DA. Emerging innovations in clinical trial design. Clin Pharmacol Ther. 2016;99(1):82–91. PubMed (This paper discusses general Bayesian adaptive methodologies including Goldilocks-style adaptations.)
- Connor JT, Broglio KR, Durkalski V, et al. The Stroke Hyperglycemia Insulin Network Effort (SHINE) trial: An adaptive trial design case study. Clin Trials. 2015;12(4):367–375. PubMed (This practical case study illustrates how a Goldilocks approach was implemented in a real-world clinical trial setting.)
- Broglio K, Meurer WJ, Durkalski V, Pauls Q, Connor J, Berry D, Lewis RJ, Johnston KC, Barsan WG. Comparison of Bayesian vs Frequentist Adaptive Trial Design in the Stroke Hyperglycemia Insulin Network Effort Trial. JAMA Netw Open. 2022 May 2;5(5):e2211616. doi: 10.1001/jamanetworkopen.2022.11616. PMID: 35544137; PMCID: PMC9096598. (The project described in ref. 3 also included a prospective comparison of a Goldilocks and Group Sequential design for the same trial).
- Lewis RJ. The pragmatic clinical trial in a learning healthcare system. Clin Trials. 2016;13(5):484–492. PubMed (This discusses adaptive and pragmatic trials, including concepts that underpin Goldilocks-style adjustments.)
- Zhan T, Zhang H, Hartford A, Mukhopadhyay S. Modified Goldilocks Design with strict type I error control in confirmatory clinical trials, Journal of Biopharmaceutical Statistics. 2020;30:5, 821-833, DOI: 10.1080/10543406.2020.1744620 (This paper presents a version of Goldilocks design with an analytical approach to controlling type-1 error, rather than relying on simulations).
- Lan KKG, DeMets DL. Discrete sequential boundaries for clinical trials. Biometrika. 1983;70(3), 659-663 (This paper laid the foundations for group sequential designs).
- Jennison C, Turnbull BW. Group Sequential Methods with Applications to Clinical Trials. 2000, CRC Press
- Pike MC. Goldilocks trials: A new approach to adaptive designs. Statistics in Medicine. 2019;38(10), 1801-1810
Examples for published applications of Goldilocks designs (** if this led to market approval):
- ** Wilber DJ et al. Comparison of antiarrhythmic drug therapy and radiofrequency catheter ablation in patients with paroxysmal atrial fibrillation: a randomized controlled trial. JAMA. 2010;303(4):333-40. doi: 10.1001/jama.2009.2029
- ** Philpott JM et al. The ABLATE Trial: Safety and Efficacy of Cox Maze-IV Using a Bipolar Radiofrequency Ablation System. Ann Thorac Surg. 2015;100(5):1541-6; discussion 1547-8. doi: 10.1016/j.athoracsur.2015.07.006.
- Konstam MA et al. Impact of Autonomic Regulation Therapy in Patients with Heart Failure: ANTHEM-HFrEF Pivotal Study Design. Circ Heart Fail. 2019;12(11):e005879. doi: 10.1161/CIRCHEARTFAILURE.119.005879.
- Premchand RK et al. Autonomic regulation therapy via left or right cervical vagus nerve stimulation in patients with chronic heart failure: results of the ANTHEM-HF trial. J Card Fail. 2014;20(11):808-16. doi: 10.1016/j.cardfail.2014.08.009.
- White WB et al. A cardiovascular safety study of LibiGel (testosterone gel) in postmenopausal women with elevated cardiovascular risk and hypoactive sexual desire disorder. Am Heart J. 2012;163(1):27-32. doi: 10.1016/j.ahj.2011.09.021.
- Lorusso R et al. Sutureless versus Stented Bioprostheses for Aortic Valve Replacement: The Randomized PERSIST-AVR Study Design. Thorac Cardiovasc Surg. 2020;68(2):114-123. doi: 10.1055/s-0038-1675847.
- Duytschaever M et al. Paroxysmal Atrial Fibrillation Ablation Using a Novel Variable-Loop Biphasic Pulsed Field Ablation Catheter Integrated With a 3-Dimensional Mapping System: 1-Year Outcomes of the Multicenter inspIRE Study. Circ Arrhythm Electrophysiol. 2023;16(3):e011780. doi: 10.1161/CIRCEP.122.011780.
- ** Nogueira RG et al. Thrombectomy 6 to 24 Hours after Stroke with a Mismatch between Deficit and Infarct. N Engl J Med. 2018;378(1):11-21. doi: 10.1056/NEJMoa1706442. Epub 2017 Nov 11. PMID: 29129157.
- ** Swanson CJ et al. A randomized, double-blind, phase 2b proof-of-concept clinical trial in early Alzheimer’s disease with lecanemab, an anti-Aβ protofibril antibody. Alzheimers Res Ther. 2021;13(1):80. doi: 10.1186/s13195-021-00813-8.
- ** Castro M et al. Effectiveness and safety of bronchial thermoplasty in the treatment of severe asthma: a multicenter, randomized, double-blind, sham-controlled clinical trial. Am J Respir Crit Care Med. 2010;181(2):116-24. doi: 10.1164/rccm.200903-0354OC.
- ** Conway CR et al. A prospective, multi-center randomized, controlled, blinded trial of vagus nerve stimulation for difficult to treat depression: A novel design for a novel treatment. Contemp Clin Trials. 2020;95:106066. doi: 10.1016/j.cct.2020.106066.