First, import the package.
library(tidyverse)
Introduction

The World Series is the annual championship series of Major League Baseball (MLB) in North America, contested since 1903 between the American League (AL) champion team and the National League (NL) champion team. The winner of the World Series championship is determined through a best-of-seven playoff, and the winning team is awarded the Commissioner’s Trophy. As the series is played during the fall season in North America, it is sometimes referred to as the Fall Classic.
From Wikipedia - World Series
In this blog, we are going to calculate the probability of several questions about the Braves and the Yankees in the World Series.
First, we need to define some parameters.
| Parameter | Explanation |
|---|---|
| PB | In any given game, the probability that the Braves win |
| PY = 1 - PB | in any given game, the probability that the Yankees win |
Questions
1. What is the probability that the Braves win the World Series given that PB=0.55?
First, we need to set the values of PB and PY.
PB <- 0.55
PY <- 1- PB
Create a function to calculate the probability of winning the series. A series win is defined as winning 4 games in a best-of-7 series.
calc_prob <- function(p){
pnbinom(3, 4, p)
}
Now calculate the probability given that PB=0.55.
calc_prob(PB)
When PB is 0.55, the probability that the Braves win the World Series is 0.608.
2. What is the probability that the Braves win the World Series given that PB=x?
Now PB is not fixed, so we assume x can be any number between 0.5 and 1.
First, we need to generate a series of PB and the probability results.
PBseries <- seq(0.5, 1, 0.01)
win_prob <- rep(NA, length(PBseries))
Now use the function from before to calculate the probability for every PB.
for(i in 1:length(win_prob)){
win_prob[i] <- calc_prob(PBseries[i])
}
In order to interpret the relationship between PB and the probability that the Braves win, we can draw a graph for them.
plot(x = PBseries,
y = win_prob,
xlim = c(0.5,1),
ylim = 0:1,
xlab = "Probability of the Braves winning a head-to-head matchup",
ylab = "P(Braves win World Series)",
main = "Probability of winning the World Series")

As we can see from this graph, when PB increases, the probability that the Braves win the World Series also increases. In fact, when we change the x-axis scale to 0.0-1.0, the line looks like a logistic curve.
3. Suppose one could change the World Series to be best-of-9 or some other best-of-X series. What is the shortest series length so that P(Braves win World Series|PB=0.55) ≥ 0.8?
As in the first question, PB needs to be 0.55. Now the series length is uncertain. The series length should be an odd number.
PB <- 0.55
series_length <- seq(1, 999, 2)
Now we need to create a function to calculate the probability when the series length is a parameter.
calc_prob_sl <- function(sl){
win_threshold <- ceiling(sl/2)
pnbinom(win_threshold - 1, win_threshold, 0.55)
}
Finally, for each series length, calculate the probability that the Braves win the World Series. When the probability is at least 0.8, stop the loop and return the series length and the probability.
for(i in 1:length(series_length)){
pb_win <- calc_prob_sl(series_length[i])
if(pb_win >= 0.8){
shortest <- series_length[i]
p_shortest <- pb_win
break}
}
shortest
p_shortest
Now we have the shortest series length. It should be 71. In that situation, the probability that the Braves win the World Series is about 0.802.
4. What is the shortest series length so that P(Braves win World Series|PB= x) ≥ 0.8? This will be a figure (see below) with PB on the x-axis and series length is the y-axis.
Now PB is not fixed again, so we assume x can be any number between 0.51 and 1.
First, we need to generate a sequence of PB values and a vector to save the length results for different PB values. In addition, we need a sequence of possible series lengths to test. The upper limit is 9999. If that is not enough, we can set a larger limit.
PBseries <- seq(0.51, 1, 0.01)
length_record <- rep(NA, length(PBseries))
series_length <- seq(1, 9999, 2)
To calculate the probability that the Braves win the World Series, we need a new function with two inputs because both the series length and PB are variables.
calc_prob_sl_p <- function(sl,pb){
win_threshold <- ceiling(sl/2)
pnbinom(win_threshold - 1, win_threshold, pb)
}
Now, calculate the shortest series length when PB is changing. Save the values in length_record.
for(j in 1:length(PBseries)){
for(i in 1:length(series_length)){
pb_win <- calc_prob_sl_p(series_length[i],PBseries[j])
if(pb_win >= 0.8){
shortest <- series_length[i]
break}
}
length_record[j] <- shortest
}
We have now obtained the shortest series length for different PB values. Let’s draw a figure to show the relationship between them.
plot(x = PBseries,
y = length_record,
xlim = c(0.5,1),
xlab = "Probability of the Braves winning a head-to-head matchup",
ylab = "Series length",
main = "Shortest series so that P(Win WS given p)>=0.8")

In this graph, as PB increases, the shortest series length required for the Braves to win the World Series with probability greater than 0.8 approaches 1. When PB is greater than 0.8, the shortest series length is 1.
5. Calculate P( PB=0.55|Braves lose 3 games before winning a 4th game) under the assumption that either PB=0.55 or PB=0.45. Explain your solution.
According to the conditional probability formula, we can get:
\[P(A|B)=\frac{P(A)P(B)}{P(B)} \to P(A)P(B)=P(A|B)P(B)\\ P(B|A)=\frac{P(A)P(B)}{P(A)} \to P(A)P(B)=P(B|A)P(A)\\ \to P(A|B)P(B)=P(A)P(B)=P(B|A)P(A)\\ \to P(A|B)=\frac{P(B|A)P(A)}{P(B)}\]Now the P(A) = P(PB=0.55), P(B) = P(Braves lose 3 games before winning a 4th game). As a result, P( PB=0.55|Braves lose 3 games before winning a 4th game) = P(Braves lose 3 games before winning a 4th game|PB=0.55) * P(PB=0.55) ÷ P(Braves lose 3 games before winning a 4th game).
P(PB=0.55) = 0.5
Then use dnbinom() to calculate P(Braves lose 3 games before winning a 4th game) and P(Braves lose 3 games before winning a 4th game|PB=0.55):
(dnbinom(3,4,0.45)+dnbinom(3,4,0.55))/2
dnbinom(3,4,0.55)
P(Braves lose 3 games before winning a 4th game) = 0.1516092
P(Braves lose 3 games before winning a 4th game | PB=0.55) = 0.1667701
P( PB=0.55|Braves lose 3 games before winning a 4th game) = P(Braves lose 3 games before winning a 4th game|PB=0.55) * P(PB=0.55) ÷ P(Braves lose 3 games before winning a 4th game)
0.1667701 * 0.5 / 0.1516092
P( PB=0.55|Braves lose 3 games before winning a 4th game) = 0.1667701 * 0.5 ÷ 0.1516092 = 0.5499999
Therefore, P( PB=0.55|Braves lose 3 games before winning a 4th game) is 0.5499999, about 0.55.
-
Previous
What’s the difference between Absolute Error and Relative Error? -
Next
If home-field advantage exists, how much of an impact does it have on winning the world series?