Sample Size and Strategy Win Rate: When the Number Means Anything
Strategy win rate without a sample size is a slogan. Compute the confidence interval, set a per-setup floor, and read the number against its own width.
By Imperial Analytics
A win rate computed on twenty trades and a win rate computed on three hundred trades share the same column header on top of two entirely different objects. The first is a point estimate inside a wide band of possible truths. The second is a point estimate inside a much narrower band. This post defines what sample size actually buys for a strategy's win rate, walks through the binomial confidence interval that tells you the width of that band, and shows where the Imperial Analytics twenty-trade floor sits inside that math.
By Imperial Analytics
What a small-sample win rate actually tells you
A win rate computed on a small trade sample is not a property of the strategy. It is a property of that specific sequence of trades, drawn from a distribution the trader cannot directly see. A strategy whose true win rate is fifty percent can post sixty percent or forty percent across the first twenty trades with substantial probability. Reading the headline figure as the strategy's win rate flattens that uncertainty and pushes the trader to act on noise.
The point becomes obvious if a trader runs the same coin-flip simulation twice. Flip a fair coin twenty times and the realized heads rate will land somewhere between roughly thirty percent and seventy percent on most runs. Flip it four hundred times and the rate will cluster within a few points of fifty percent on most runs. The coin did not change. The sample did. A trading strategy's win rate behaves the same way, because a closed trade is a binary outcome and a sequence of closed trades is a sequence of binary draws from whatever the strategy's true win rate happens to be.
The implication is that a headline win rate carries no information about precision without its sample size attached. A "fifty-eight percent strategy" is a different object at n = 18 than it is at n = 180. The journal that prints only the percentage hides the difference, and the trader who reads only the percentage acts as if it were not there.
The binomial confidence interval, plain
A win rate is a proportion of binary outcomes, so its sampling behavior is described by the binomial distribution. The standard error of a sample win rate p is the square root of p times one minus p, divided by the sample size n. A ninety-five percent confidence interval is the sample win rate plus or minus 1.96 times that standard error. The wider the interval, the less the headline number tells the trader on its own.
The formula in plain arithmetic, for a sample win rate of fifty percent:
- At n = 20: SE = sqrt(0.5 times 0.5 / 20) = sqrt(0.0125) = 0.1118. Ninety-five percent interval = 0.50 plus or minus 0.219 = [0.28, 0.72].
- At n = 100: SE = sqrt(0.5 times 0.5 / 100) = 0.05. Ninety-five percent interval = 0.50 plus or minus 0.098 = [0.40, 0.60].
- At n = 400: SE = sqrt(0.5 times 0.5 / 400) = 0.025. Ninety-five percent interval = 0.50 plus or minus 0.049 = [0.45, 0.55].
The pattern is the one a trader needs to internalize. The interval narrows with the square root of the sample size, not with the sample size itself. Going from twenty trades to four hundred trades is a twenty-fold increase in sample size and roughly a four-and-a-half-fold reduction in interval width. The math is asymmetric: early sample buys a lot, and late sample buys less per trade added.
↳ Note
A win rate without a sample size is a slogan.
How many trades are enough?
The honest answer depends on how wide a confidence interval the trader is willing to act on. A plus-or-minus ten percentage points band requires roughly one hundred trades around a fifty percent win rate; plus-or-minus five points requires roughly four hundred. The Imperial Analytics twenty-trade floor is the line below which a per-setup figure should not be surfaced as a claim at all. It is not the line at which the figure becomes precise.
The numbers come straight from the standard error equation rearranged for n. If a trader wants a ninety-five percent interval no wider than plus-or-minus E around a sample win rate of p, the sample size required is approximately n = (1.96 / E) squared times p times (1 minus p). For p = 0.5 and E = 0.10, that is roughly 96 trades. For E = 0.05, roughly 384. For E = 0.025, roughly 1,536.
A trader who wants to act on a per-setup win rate as a precise number, not a noisy one, is therefore looking at hundreds of qualified trades per setup, not tens. That number sounds high until the trader compares it to the lifetime trade count of most setups in a real journal. Many setups never reach the sample at which their win rate is precise. The honest move is to display the win rate next to its sample size and its interval, and let the reader judge whether the band is narrow enough to act on yet.
The twenty-trade floor is a different threshold. It is the no-surface line: below twenty matching trades, the per-setup figure is held back from being read as a claim about the setup at all. Above twenty, the figure is shown with its sample size and its interval. Twenty is the floor, not the destination.
Data note
The sample-size figures in this section are computed for a sample win rate of fifty percent, which gives the widest interval and therefore the most conservative requirement. Sample win rates further from fifty percent need slightly fewer trades for the same interval width. Substitute the trader's own observed win rate into the formula to compute the requirement for that specific strategy.
Comparing two setups' win rates honestly
Two setups' win rates can differ by ten percentage points and still be inside each other's confidence intervals at a small sample. The honest comparison reports both win rates with their intervals and asks whether the two intervals overlap. Overlapping intervals are a story that the data has not yet separated the two setups. Non-overlapping intervals at a reasonable sample size are the start of an edge claim worth acting on.
The trap that this disarms is the one where a trader runs two setups for thirty trades each, sees fifty-five percent on one and forty-five percent on the other, and concludes that the first setup is the real one. The arithmetic says that at n = 30, the interval around fifty-five percent is roughly [0.37, 0.73] and the interval around forty-five percent is roughly [0.27, 0.63]. The two intervals overlap by a large margin. The ten-percentage-point gap in the point estimates is consistent with both setups having identical true win rates and the trader having drawn a slightly better thirty-trade run from one of them.
The honest read is to keep running both setups, log each carefully, and recheck when each has cleared a sample size that produces intervals narrow enough to separate. If after one hundred trades on each the gap is still ten percentage points and the intervals no longer overlap, the difference is starting to look real. If the gap has compressed toward zero, the early read was noise.
This is also why pooling setups into a single strategy-level win rate is misleading. A strategy that runs three setups can post a strategy-level win rate that looks fine while one of the setups is bleeding the trader on a smaller sample and the other two are carrying it. Per-setup discipline is what makes the comparison readable.
Where this fits next to expectancy and R-multiple
Win rate alone never proves a strategy works, which is why expectancy and R-multiple sit next to it in any serious journal. Win rate describes how often the plan resolved positive; expectancy describes the average outcome per trade; R-multiple normalizes each outcome to planned risk. A strategy with a forty percent win rate and a positive R-expectancy can be sound. A strategy with a seventy percent win rate and a negative R-expectancy is not.
The reason the three figures travel together is that win rate is a count and the other two are sizes. A strategy that wins six times out of ten and loses thirty cents for every ten cents it wins is losing money. A strategy that wins four times out of ten and wins thirty cents for every ten cents it loses is making money. Win rate cannot tell the trader which strategy is which on its own. Expectancy and R-multiple can.
The confidence-interval discipline applies to all three. Expectancy computed on twenty trades has a wide band around it. So does an R-multiple distribution. A journal that prints all three figures with their sample sizes and their uncertainty gives the trader a fuller picture than any one of them alone can.
The point is not that win rate is unimportant. The point is that win rate is one column out of three, and the column is more honest when it carries its sample size and its interval with it.
The Imperial Analytics twenty-trade floor
The twenty-trade per-setup floor used in Imperial Analytics is a no-surface threshold, not a precision threshold. Below twenty matching trades, a per-setup win rate is not displayed as a claim about that setup. Above twenty, the figure is shown with its sample size and its interval so that the reader sees both the point estimate and the uncertainty around it. Twenty is the floor on display, not the destination of precision.
The floor exists because point estimates below twenty trades behave so erratically that displaying them as a setup-level figure encourages overreaction. A 70% win rate over fifteen trades and a 35% win rate over fifteen trades are both consistent with a true win rate near fifty percent. Surfacing either as a per-setup claim invites the trader to widen size on the first and abandon the second, both of which are responses to noise.
The floor does not mean a setup with twenty trades is well-measured. It means a setup with twenty trades has the minimum sample at which a point estimate plus an interval is honest to display. The journal still prints the interval, and the reader still has to decide whether the interval is narrow enough to act on. Twenty is a gate, not a guarantee.
The same floor applies to per-setup expectancy in R, per-hour win-rate breakdowns, per-instrument win-rate breakdowns, and any other slice the journal supports. Each slice gets its own sample-size check, because each slice is its own count of binary outcomes. A strategy can clear the floor at the strategy level and fail it at every slice it is broken into. The slice that fails the floor is held back; the slice that clears it is displayed with its interval.
Frequently asked questions
- q: What does a confidence interval on a win rate actually mean? a: A ninety-five percent confidence interval is the band such that, across many repeated samples of the same size from the same strategy, ninety-five percent of the computed intervals would contain the true win rate. It is a statement about the procedure that built the interval, not about the single sample drawn.
- q: Is twenty trades enough to read a strategy's win rate? a: Twenty is the per-setup floor below which Imperial Analytics holds a win rate back from display as a claim. At that sample, a fifty percent win rate sits inside a roughly plus-or-minus twenty-two percent band. The figure is real; the precision is not.
- q: Why does the confidence interval get narrower as sample size grows? a: The standard error scales with one divided by the square root of the sample size. Going from twenty trades to four hundred narrows the band by a factor of roughly four-and-a-half, not twenty, because the relationship is square root, not linear.
- q: Does a higher win rate need fewer trades to be meaningful? a: At the extremes the width does change. A ninety percent or ten percent win rate has a slightly narrower interval at the same sample size than a fifty percent rate, because the variance term shrinks as p moves away from one half. The minimum-sample logic still applies; only the bandwidth at that minimum changes.
- q: Should the sample size apply to the whole strategy or to each setup? a: Each setup. A strategy that runs three setups should clear the floor in each of them before any setup-level win rate is read as a claim about that setup. Pooling the three setups into one strategy-level win rate hides which setup is paying for the others.
- q: How does sample size interact with R-multiple and expectancy? a: The same floor applies. A per-setup R-expectancy figure computed on twenty trades is at the minimum-display threshold, not the precise-figure threshold. The journal prints the expectancy with its sample size and its uncertainty for the same reason it prints the win rate that way.