The Good Experimental Design toolkit

Templates and checklist to level-up your experimental design

The Good Experimental Design Toolkit with four templates and a checklist.
The Good Experimental Design Toolkit by Erin Weigel.

As Ronald Fisher learned, experiment data is only as good as the design you put into it.

This calls-to-mind a common mantra among data scientists and software engineers: “Garbage in, garbage out.” If the experiment has a poorly designed hypothesis—even if the test is randomized and controlled—it gives you garbage evidence. If your hypothesis is sound, but the math is bad, again—garbage data. To avoid creating garbage, follow the templates in the “The Good Experimental Design Toolkit.” It contains four process templates, each with its own overarching theme to guide your approach.

Hypothesize: Design Like You’re Right

The hypothesize template "Design like you're right" Baseline
State The current state is... [description of base]. Research Insight: Based on... [research], [observation], or [evidence]... Variant: We believe that... [description of testable idea]. Problem Statement: This is a [problem or opportunity] because... [assumption about the value it can create]. Prediction: If, we [proposed change] to [independent variable(s)], then [expected impact] on [dependent variable(s)].
Figure 1: The Hypothesize template captures the thought process behind the change(s) you will test.

Validate: Test Like You’re Wrong

“Test Like You’re Right” flips the Hypothesize hubris on its head by reiterating the extremely skeptical attitude you need when running a null hypothesis test. The second template, Validate in Figure 2, covers the following information:

  • The Null Hypothesis Reminder.
    This reiterates your skeptical mindset, meaning you will not accept a new belief unless there’s convincing evidence that sways you to believe otherwise. 
  • Metrics & Math.
    This outlines the exact evidence you’d need to observe to be convinced to reject the null hypothesis.
  • Test Type.
    This clarifies if you’re aiming to make things better (with a “superiority test”), or if your goal is simply to not make things worse (with a “non-inferiority test”).
The Validate Template "Test Like You're Wrong" The change is tested against the current state. The assumption is that the change has no effect. However, if the effect outlined in Metrics & Math is observed, we will... change our minds, reject the current state, and adopt the change as the new current state. Otherwise, we will reject the change and keep things in their current state. Metrics & Math This test is designed to find an impact on...
 Blank goal metric at a... Blank % minimum detectable effect at a... Blank % significance level and a... Blank % statistical power after... Blank visitors, and... Blank run time. Test Type Superiority Non-inferiority Decision Moment Test will start on [insert date].
Test will stop on [insert date]. 

A decision will be made after the experiment runs for this pre-determined length of time.

Figure 2: The Validate template defines the evidence you’d need to observe that would convince you to change your mind.

Create: Make With Care

“Make with Care” reminds you that you never purely test an idea—you always test the execution of an idea. Bugs and poor design decisions can doom your idea right out of the gate, so execution quality is key. The third template, Figure A.3, covers the following information:

  • Assumptions.
    These are things you believe to be true that you have no evidence for. Making assumptions is a necessary part of learning because you cannot have evidence for every belief you hold.
  • Design Decisions.
    Include any relevant information about the design decisions you made here. For example, explain why you chose a specific color that may deviate from a company color palette. The information you put here acts as a form of design documentation to help others learn about the execution of the idea you chose to test and why.
  • Development Decisions.
    This block acts as your engineering documentation. Explain in this section what technology you used and why. For example, share what code language or tech stack you used.

The Create template is the foundation for your design and development documentation, which will help you during the Analyze Phase. (Refer to Chapter 7 in Design for Impact.)

More useful resources

Control 13 people with 5 confounds. Treatment 7 people with 5 confounds. Warning! This is an SRM.

What you need to know about sample ratio mismatches (SRMs)

Randomization within experimentation is important. It’s how we isolate the change we aim to learn about. When randomization goes wrong, you can get an SRM.

Go to resource
Screenshot of Lukas Vermeer's manual SRM checker at LukasVermeer.nl

Lukas Vermeer’s manual sample ratio mistmatch (SRM) checker

Go to resource
Screenshot of Speero's A/B tesing tool comparison website

A/B testing tool comparison

Speero’s A/B testing tool comparison website Helping you find the right experimentation tool quickly and easily Speero’s A/B testing too comparison website includes a comprehensive list of options. If you’re […]

Go to resource
Shopping Cart
Scroll to Top