Compare n' Bet™

Sample Size and Statistical Significance

A guy goes 110-90 over 200 bets at -110 and figures he's a 55% bettor. The math says he might just be a coin flip on a hot streak. Here's how to tell the difference.

Advanced topic. Assumes you're comfortable with EV, ROI, win rate, vig, and standard probability notation. New terms link to the Glossary. Sports betting carries real financial risk; if you need help, call 1-800-522-4700 or visit ncpgambling.org.

The number nobody tells beginners

To prove with 95% confidence that you're a true 55% bettor at -110 odds, you need around 1,200 to 1,400 settled bets. To prove with 95% confidence that you have a 4% ROI edge, you need around 5,000 to 7,000 bets. These aren't ballpark guesses from somebody on Twitter. They fall straight out of the binomial standard error and the central limit theorem.

Most retail bettors place 200 to 500 bets a year. At that rate, proving an edge to yourself with real statistical confidence takes between three and ten years. That's the structural reason most bettors who think they have an edge are wrong about it. It's also the reason that bettors who genuinely do have an edge can spend years unable to tell themselves apart from the unlucky-no-edge group on the data alone.

Setting up the math

Each bet is a Bernoulli trial. It wins with probability p or loses with probability 1 − p. Across n bets, the number of wins follows a binomial distribution with mean np and variance np(1 − p). Win rate is the sample proportion p̂ = wins / n, with standard error:

Standard error of win rate SE(p̂) = √(p(1 − p) / n)

For large n, the sample win rate is approximately normally distributed around the true win rate, so a 95% confidence interval is roughly p̂ ± 1.96 · SE(p̂).

Worked example. Same bettor as the subtitle: 110-90 over 200 -110 bets. Sample win rate is 0.55. Standard error using the sample value:

SE = √(0.55 · 0.45 / 200) = √(0.001238) = 0.0352

95% CI: 0.55 ± 1.96 · 0.0352
= 0.55 ± 0.0690
= [0.481, 0.619]

So this bettor's 95% confidence interval for true win rate runs from 48.1% to 61.9%. The breakeven win rate at -110 is 52.38%. The interval contains 50% (no edge, just lucky) and 60% (massive edge, professional). The data is genuinely consistent with both. After 200 bets you cannot tell whether you're a coin flip on a hot streak or a serious edge bettor having a slow start.

The breakeven win rate at any price

To convert American odds into a breakeven win rate (the win rate where the bet has zero EV before considering vig elsewhere):

For negative American odds (favorites) breakeven = |odds| / (|odds| + 100)

For positive American odds (underdogs) breakeven = 100 / (odds + 100)

Examples -110 → 110 / 210 = 0.5238 (52.38%)
-200 → 200 / 300 = 0.6667 (66.67%)
+150 → 100 / 250 = 0.4000 (40.00%)
+300 → 100 / 400 = 0.2500 (25.00%)

Beating breakeven by N percentage points produces an EV per bet that depends on the price. Beating -110 by 2 points (winning 54.38% vs the 52.38% breakeven) gets you about 3.8% EV per dollar wagered. Beating +300 by the same 2 points (27% vs 25% breakeven) gets you about 8% EV per dollar wagered. Beating breakeven by the same percentage on longer prices is worth more per bet, but it takes a bigger sample to confirm.

Sample size for a stated win rate edge

To detect a true win rate of p as significantly different from breakeven b at a confidence level z, the required sample size is approximately:

Required sample size n = (z · √(p(1 − p)) / (p − b))2

For 95% confidence (z = 1.96) n = (1.96)2 · p(1 − p) / (p − b)2
= 3.8416 · p(1 − p) / (p − b)2

For a true 55% win rate at -110 (b = 0.5238):

n = 3.8416 · (0.55)(0.45) / (0.55 − 0.5238)2
n = 3.8416 · 0.2475 / (0.0262)2
n = 0.9508 / 0.000687
n ≈ 1,384 bets

For a true 54% win rate at -110 (smaller, more realistic edge):

n = 3.8416 · (0.54)(0.46) / (0.54 − 0.5238)2
n = 3.8416 · 0.2484 / (0.0162)2
n = 0.9542 / 0.000262
n ≈ 3,640 bets

For a true 53% win rate at -110 (small but still profitable):

n = 3.8416 · (0.53)(0.47) / (0.53 − 0.5238)2
n = 3.8416 · 0.2491 / (0.0062)2
n = 0.9569 / 0.0000384
n ≈ 24,900 bets

The non-linearity here is brutal. A 55% bettor needs around 1,400 bets to prove themselves. A 53% bettor (still solidly profitable) needs nearly 25,000 bets. Smaller edges require dramatically larger samples. Bettors with marginal edges effectively cannot prove their edge to themselves within a betting career, even if the edge is real.

Standard error of ROI

Win rate works fine when every bet is the same price. Most real betting samples involve a mix of prices, so the more general statistic is ROI per bet, computed across every settled wager regardless of price. The math gets heavier.

For a single bet at decimal odds o with stake 1 and true win probability p, the per-bet return R is:

R = (o − 1) with probability p (you win, profit is o − 1)
R = −1 with probability (1 − p) (you lose, lose your stake)

Expected return per bet E[R] = p(o − 1) − (1 − p) = po − 1

Variance of return per bet Var(R) = E[R2] − E[R]2
E[R2] = p(o − 1)2 + (1 − p)
Var(R) = p(o − 1)2 + (1 − p) − (po − 1)2
Var(R) = p(1 − p)o2

Variance of return per bet at decimal odds o with true win probability p is p(1 − p)o2. Clean result. Variance scales with o2, so longshots have wildly more variance per bet than favorites. A +500 bet (decimal 6.0) has roughly 36 times the per-bet variance of a -110 bet (decimal 1.91, so o2 ≈ 3.65) at equivalent stakes.

For a bettor placing N bets at varying prices, the variance of average ROI is the average per-bet variance divided by N:

Standard error of average ROI SE(ROI) = √((1/N) · (1/N) · Σ Var(Ri))
SE(ROI) = √((1/N2) · Σ pi(1 − pi)oi2)

For uniform pricing at decimal odds o SE(ROI) = √(p(1 − p)o2 / N)

Plug in numbers for a bettor placing 1,000 -110 bets (decimal 1.91) with a true win rate of 0.5238 (the breakeven, so true edge is exactly zero):

Per-bet variance = 0.5238 · 0.4762 · 1.912
= 0.2494 · 3.6481 = 0.9100

SE(ROI) = √(0.9100 / 1000) = √(0.000910) = 0.0302

95% CI for ROI after 1,000 bets at zero true edge:
0 ± 1.96 · 0.0302 = ±5.92%

A no-edge bettor placing 1,000 -110 bets has a 95% chance of finishing somewhere between −5.92% ROI and +5.92% ROI on luck alone. So a bettor finishing the year at +4% ROI cannot statistically distinguish themselves from a true zero-edge bettor at all. The same bettor finishing at +7% ROI just barely clears the 95% bar, and only if their sample is exactly 1,000 bets.

Required sample size for a stated ROI edge

To prove a true ROI of r at confidence level z, the sample size you need is roughly:

Required sample size for ROI N = (z · σ / r)2

Where σ is the per-bet standard deviation σ = √(p(1 − p)o2)

For a true 4% ROI edge at uniform -110 pricing:

σ = √(0.5446 · 0.4554 · 1.912) = √(0.9043) = 0.9510
Note: at +4% ROI, true win rate is approximately 0.5446

N = (1.96 · 0.9510 / 0.04)2
N = (46.60)2
N ≈ 2,170 bets

For a true 2% ROI edge:

σ ≈ 0.954
N = (1.96 · 0.954 / 0.02)2
N = (93.49)2
N ≈ 8,740 bets

For a true 1% ROI edge:

N = (1.96 · 0.955 / 0.01)2
N ≈ 35,000 bets

Same pattern as with win rate, just expressed in the metric most bettors actually track. A 4% ROI edge takes around 2,200 bets to confirm. A 1% ROI edge takes 35,000. There's a real reason serious bettors are obsessive about line shopping for an extra 1% to 2% of EV: that extra bit of edge changes the sample size required to prove the edge from a lifetime down to a year.

Reverse calculation: what does my sample actually tell me?

The more useful question for most bettors is the inverse. Given N bets at observed ROI r̂, what's the 95% confidence interval for true ROI?

95% CI for true ROI from a sample r̂ ± 1.96 · √(s2/N)

Where s2 is sample variance of returns s2 = (1/(N−1)) Σ (Ri − r̂)2

For uniform -110 pricing, you can approximate s as 0.95. So the rule of thumb for a -110 bettor:

95% CI rule of thumb at -110 true ROI = observed ROI ± 1.86 / √N

Examples 100 bets at +5% ROI: true ROI in [−13.6%, +23.6%]
500 bets at +5% ROI: true ROI in [−3.3%, +13.3%]
1,000 bets at +5% ROI: true ROI in [−0.9%, +10.9%]
3,000 bets at +5% ROI: true ROI in [+1.6%, +8.4%]

A bettor at +5% ROI over 1,000 bets at -110 cannot reject the null hypothesis that their true edge is zero. The lower bound of the 95% CI is −0.9%, which still includes zero. They have to push the sample to 1,500 or 2,000 bets to clear that bar.

Why CLV is the smarter signal

Closing line value collapses the sample-size problem. CLV is asking a different question: did the bet you placed beat the closing line? That comparison is direct, sport-independent, and gives you a usable signal in dozens of bets instead of thousands.

The mechanism: the closing line is the most informed estimate of the true probability available before the game starts. If your bets consistently beat the closing line, you're getting better odds than the market consensus, which mathematically has to produce profitable betting if you keep doing it. CLV is a leading indicator, not a confirming one.

CLV per bet (in implied probability) CLVi = (implied probability of close) − (implied probability of bet)

Average CLV across N bets CLV̄ = (1/N) Σ CLVi

Variance of CLV per bet is much smaller than variance of bet outcomes. CLV is the difference between two market estimates of the same probability, not the realization of the underlying random outcome. So the standard error of average CLV is much smaller, and the sample size you need for confidence is around 50 to 100 bets, not thousands.

A bettor consistently beating the close by 2% is showing a real edge. They don't need to wait for results to confirm it. On the flip side, a bettor with +5% ROI over 200 bets but flat or negative CLV is almost certainly running hot. The results will mean-revert downward.

For working with CLV directly, see the dedicated CLV guide.

Practical takeaways

  • Treat any conclusion drawn from fewer than 500 bets as basically uninformative. Variance is too high.
  • Treat 500 to 1,500 bets as suggestive but not conclusive. Hot and cold streaks within that range are normal and consistent with a wide range of true edges.
  • Treat 1,500 to 5,000 bets as the sample where moderate edges (3% to 5% ROI) become statistically distinguishable from zero.
  • Smaller edges (1% to 2% ROI) need samples in the tens of thousands and are essentially unconfirmable for most retail bettors over reasonable time horizons.
  • If you need a faster answer, track CLV. CLV converges fast enough to give you a directional read in a few months.
  • Variance per bet scales with the square of decimal odds. A bettor playing longshots will swing wildly even with the same true edge as a bettor playing favorites. Adjust expectations and sample sizes accordingly.
  • The bettor at +5% ROI through 200 bets is in a probability cloud that includes pros, lucky amateurs, and unlucky pros. The data alone can't tell you which one. Ten more years of betting will, eventually.

The honest summary

Most retail bettors who think they're winning at sports are working with sample sizes too small to support that belief. That isn't a moral failing or a math insult. It's a structural feature of low-edge, high-variance gambling. The signal-to-noise ratio is bad enough that even genuinely profitable bettors need years of data to prove their edge to themselves with real statistical confidence.

Same math means running cold for a year, even badly, doesn't necessarily mean your edge is gone. It might mean you lost the variance lottery for that year. Telling real edge erosion apart from variance is one of the genuinely hard problems in retail betting, and the only durable answer is to track CLV alongside results, because CLV is observable in real time and converges fast enough to be useful.

If this guide is depressing, that's part of the point. The math is what it is. The bettors who survive long-term are the ones who size correctly, track honestly, and avoid drawing big conclusions from small samples in either direction.

Disclaimer

The content on Compare n' Bet is published for educational and informational purposes. By reading this guide you acknowledge:

  • This is not professional gambling, financial, legal, or tax advice. It is general information about sports betting strategy and theory.
  • Sports betting involves substantial financial risk. The strategies and models described do not guarantee profit, eliminate variance, or constitute predictions of future events.
  • Sports betting is regulated differently in every jurisdiction. The reader is responsible for ensuring all activity complies with the laws of their location.
  • Sportsbook terms of service vary by operator. Every sportsbook has its own rules about account usage, betting patterns, and what constitutes acceptable use. The reader is responsible for reading and complying with the terms of any operator they use.
  • Compare n' Bet does not encourage, endorse, or facilitate any activity that would violate applicable law or sportsbook terms of service.
  • Nothing in this guide should be interpreted as a recommendation to deposit, wager, or take any specific financial action. All examples are illustrative.
  • Past performance, hypothetical scenarios, and mathematical models are not predictive of future results.
  • Compare n' Bet, DeeDubyah Software LLC, and our affiliated entities accept no liability for losses, damages, or other consequences resulting from decisions made on the basis of any content on this website.

If you or someone you know has a problem with gambling, the National Council on Problem Gambling helpline is 1-800-522-4700 (US) or visit ncpgambling.org. International readers can find local resources at gamblingtherapy.org.