TL;DR
An experiment in Shipit lets you show two different versions of your checkout shipping options to different groups of customers at the same time. One group sees Variant A; another sees Variant B. You decide how the traffic is split. After running the experiment for long enough to gather meaningful data, you compare the results in your analytics platform and roll out whichever version performed better.
Experiments are attached to a checkout setup as a whole — not to individual shipping options. This means you are testing entire checkout configurations against each other, not just tweaking a single delivery method. This is the right level of granularity for most meaningful tests, because shipping decisions rarely happen in isolation from the rest of the checkout experience.
What You Can Test
The most common use case is testing whether a change to your shipping options presentation affects how many customers complete their purchase. Examples of questions experiments can help answer:
- Does showing "Free Shipping" more prominently on one option increase overall conversion rate?
- Do customers convert better when they see estimated delivery days, or when they see a specific expected arrival date?
- Does offering more delivery choices (home delivery, parcel locker, express) increase checkout completion, or does it create decision paralysis?
- Is a new premium delivery option adding to revenue, or is it distracting customers from your standard option?
The key is that you need a clear question before you start. Running an experiment without knowing what you are measuring will produce data you cannot act on.
How Buckets Work
Each experiment divides your customers into two groups: Bucket A and Bucket B. Every customer is assigned to a bucket when they enter checkout, and they see the shipping configuration that corresponds to their bucket.
Traffic Split
You control how traffic is divided through the Bucket A Percentage setting — a number between 0 and 100. Bucket B automatically receives the remaining traffic.
| Bucket A % | Bucket B % | When to Use |
|---|---|---|
| 50 | 50 | Standard split test — equal comparison, fastest results |
| 10 | 90 | Cautious rollout — test a new configuration on a small slice of traffic first |
| 80 | 20 | Phased transition — most customers already see the new version; small holdback for comparison |
A 50/50 split is the default for most tests because it gives you statistically comparable data in the shortest time. Use a smaller Bucket A percentage when you are introducing a significant change that you are not yet confident about — this limits exposure if the new configuration performs poorly.
Tip: The smaller the test group, the longer you need to run the experiment before you can draw conclusions. A 10% test group takes roughly five times as long to reach the same confidence level as a 50% split. Factor this into your timeline.
Bucket Labels
By default, the two groups are called "A" and "B." You can rename them to something more meaningful using the Bucket A Label and Bucket B Label fields. Good labels make it much easier to read your analytics reports without having to remember which bucket was which.
Examples of useful label pairs:
- "Free Shipping Emphasis" / "Speed Emphasis"
- "Three Options" / "Five Options"
- "New Layout" / "Current Layout"
- "With Estimated Date" / "Without Estimated Date"
Setting Up an Experiment
Required Fields
| Field | What It Is |
|---|---|
| Name | Internal name for your experiment (not shown to customers) |
| Description | Brief note on what you are testing and why |
| Bucket A Percentage | Percentage of traffic assigned to Bucket A (0–100) |
| Bucket A Label | Friendly name for Bucket A |
| Bucket B Label | Friendly name for Bucket B |
| Is Active | Whether the experiment is currently running |
Is Active
The Is Active toggle lets you pause an experiment without deleting it. This is useful when:
- You need to stop a test temporarily (for example, during a sale period when traffic patterns are atypical)
- You have drawn your conclusions and want to stop the split but keep the experiment record for reference
- You want to set up an experiment in advance and activate it on a specific date
Multiple Experiments
You can define more than one experiment on a single checkout setup. However, running multiple experiments simultaneously means customers could be assigned to overlapping groups, which makes it harder to attribute results to any single change. As a general rule, run one experiment at a time on a given checkout setup.
What Experiments Cannot Do
Experiments apply to whole checkout setups, not individual shipping options. You cannot run an A/B test where half of customers see one version of a single shipping option within an otherwise identical checkout.
Shipit does not include a built-in results dashboard. After running an experiment, you will not find conversion rates or winner declarations inside Shipit. You need to connect your analytics platform and segment results by the bucket labels you configured.
Experiments do not automatically declare a winner or make changes. When you are ready to roll out the better-performing configuration, you do so manually.
Tip: Before starting an experiment, confirm with your analytics team that bucket assignments are being captured in your reporting tool. Starting a test and discovering two weeks later that the data was not recorded is a frustrating and avoidable problem.
Interpreting Your Results
When you review experiment results in your analytics platform, look for statistically significant differences — not just which number is higher. Small differences in conversion rate over a short period could be random variation rather than a real effect of your change.
A practical rule of thumb: run the experiment until each bucket has had at least a few hundred completed checkouts, or for at least two full weeks to account for weekday and weekend variation in shopping behavior. For lower-traffic stores, longer is better.
Focus on the metric that matters most to your business — usually checkout completion rate (conversions), but potentially also average order value if your test involves upsell-style shipping upgrades.
