Import the packages first.
library(magrittr)
library(sn)
Situation Description
The central limit theorem is an important computational shortcut for generating and making inferences from the sampling distribution of the mean. Recall that the central limit theorem shortcut relies on a number of conditions, specifically:
- Independent observations
- Identically distributed observations
- Mean and variance exist
- Sample size large enough for convergence
In this simulation study, I am going to compare the sampling distribution of the mean generated by simulation with the sampling distribution implied by the central limit theorem. I will compare the distributions graphically in QQ-plots.
This will be a 4 × 4 factorial experiment. The first factor will be the sample size, with N = 5, 10, 20, and 40. The second factor will be the degree of skewness in the underlying distribution. The underlying distribution will be the Skew-Normal distribution. The Skew-Normal distribution has three parameters: location \(\xi\), scale \(\omega\), and slant \(\alpha\). When the slant parameter is 0, the distribution reverts to the normal distribution. As the slant parameter increases, the distribution becomes increasingly skewed. In this simulation, the slant will be set to 0, 2, 10, 100. Set location and scale to 0 and 1, respectively, for all simulation settings.
Plot preparation
At the beginning, we need to set up the parameters that do not change in the following steps. The slant of the Skew-Normal distribution will change later; therefore, only the location \(\xi\) and scale \(\omega\) will be set in this part.
R <- 5000
location <- 0
scale <- 1
location: \(\xi\), scale: \(\omega\), slant: \(\alpha\)
Before creating the function, let’s clarify the formulas for calculating delta, mean, and standard deviation for the central limit theorem (CLT).
-
Delta: \(\delta =\frac{\alpha}{\sqrt{1+\alpha^2}}\)
-
Mean: \(\xi+\omega \delta \sqrt{\frac{2}{\pi}}\)
-
Standard deviation: \(\sqrt{\omega^2(1-\frac{2\delta^2}{\pi}) }\)
Then define the function for the CLT process and generate the QQ-plots by using the formulas above.
qqplot_creator <- function(slant, N) {
delta <- slant / (sqrt(1 + slant ^ 2))
# Quantites to calculate/generate
pop_mean <- location + scale * delta * (sqrt(2 / pi))
pop_sd <- sqrt(scale ^ 2 * (1 - ((2 * delta ^ 2) / pi)))
Z <- rnorm(R) # generate the normal distribution as the basement
#CLT approximation
sample_dist_clt <- Z * (pop_sd / sqrt(N)) + pop_mean
#Simulation approximation
random.skew <- array(rsn(R * N, xi = location, omega = scale, alpha = slant),
dim = c(R, N))
sample_dist_sim <- apply(random.skew, 1, mean)
qqplot(sample_dist_clt, sample_dist_sim, axes = FALSE, frame.plot=TRUE, ann = FALSE)
abline(0,1)
}
QQplot generation
Now we can set the slants and Ns we want to test in the following steps. As required, N = 5, 10, 20, and 40, and the slant will be set to 0, 2, 10, and 100. Then create a sequence to define the points we want to test.
slant <- c(0,2,10,100)
N <- c(5,10,20,40)
x <- seq(-2,2,0.01)
Set up a graph to put all the QQ-plots together and use the qqplot_creator function to fill in each QQ-plot.
par(mfrow=c(4,5),mai=c(0.1,0.1,0.1,0.1), oma = c(0, 4, 4, 0))
for(i in slant){
plot(dsn(x,
xi = location,
omega = scale,
alpha = i),
axes = FALSE,
frame.plot=TRUE,
type = "l",
xlab = NA, ylab = NA)
for(j in N){
qqplot_creator(i, j)
}
}
mtext(text="Distribution N=5 N=10 N=20 N=40",
side = 3,
outer = TRUE)
mtext(text="slant = 100 slant = 10 slant = 2 slant = 0",
side = 2,
outer = TRUE)

Conclusion
When N is larger, the QQ-plot fits the y=x line better, which means the CLT works better for approximating the distribution. When the slant is larger, meaning the Skew-Normal distribution has higher skewness, it becomes more difficult to approximate the distribution.
-
Previous
What makes a trending video? -
Next
Power and Sample Size Calculations for Correlational Studies