Import the packages first.

library(magrittr)
library(sn)

Situation Description

The central limit theorem is an important computational shortcut for generating and making inferences from the sampling distribution of the mean. Recall that the central limit theorem shortcut relies on a number of conditions, specifically:

Independent observations
Identically distributed observations
Mean and variance exist
Sample size large enough for convergence

In this simulation study, I am going to compare the sampling distribution of the mean generated by simulation with the sampling distribution implied by the central limit theorem. I will compare the distributions graphically in QQ-plots.

This will be a 4 × 4 factorial experiment. The first factor will be the sample size, with N = 5, 10, 20, and 40. The second factor will be the degree of skewness in the underlying distribution. The underlying distribution will be the Skew-Normal distribution. The Skew-Normal distribution has three parameters: location \(\xi\), scale \(\omega\), and slant \(\alpha\). When the slant parameter is 0, the distribution reverts to the normal distribution. As the slant parameter increases, the distribution becomes increasingly skewed. In this simulation, the slant will be set to 0, 2, 10, 100. Set location and scale to 0 and 1, respectively, for all simulation settings.

Plot preparation

At the beginning, we need to set up the parameters that do not change in the following steps. The slant of the Skew-Normal distribution will change later; therefore, only the location \(\xi\) and scale \(\omega\) will be set in this part.

R <- 5000
location <- 0
scale <- 1

location: \(\xi\), scale: \(\omega\), slant: \(\alpha\)

Before creating the function, let’s clarify the formulas for calculating delta, mean, and standard deviation for the central limit theorem (CLT).

Delta: \(\delta =\frac{\alpha}{\sqrt{1+\alpha^2}}\)
Mean: \(\xi+\omega \delta \sqrt{\frac{2}{\pi}}\)
Standard deviation: \(\sqrt{\omega^2(1-\frac{2\delta^2}{\pi}) }\)

Then define the function for the CLT process and generate the QQ-plots by using the formulas above.

qqplot_creator <- function(slant, N) {
  delta <- slant / (sqrt(1 + slant ^ 2))

  # Quantites to calculate/generate
  pop_mean <- location + scale * delta * (sqrt(2 / pi))
  pop_sd <- sqrt(scale ^ 2 * (1 - ((2 * delta ^ 2) / pi)))

  Z <- rnorm(R) # generate the normal distribution as the basement

  #CLT approximation
  sample_dist_clt <- Z * (pop_sd / sqrt(N)) + pop_mean

  #Simulation approximation
  random.skew <- array(rsn(R * N, xi = location, omega = scale, alpha = slant),
                      dim = c(R, N))

  sample_dist_sim <- apply(random.skew, 1, mean)

  qqplot(sample_dist_clt, sample_dist_sim, axes = FALSE, frame.plot=TRUE, ann = FALSE)
  abline(0,1)

  }

QQplot generation

Now we can set the slants and Ns we want to test in the following steps. As required, N = 5, 10, 20, and 40, and the slant will be set to 0, 2, 10, and 100. Then create a sequence to define the points we want to test.

slant <- c(0,2,10,100)
N <- c(5,10,20,40)
x <- seq(-2,2,0.01)

Set up a graph to put all the QQ-plots together and use the qqplot_creator function to fill in each QQ-plot.

par(mfrow=c(4,5),mai=c(0.1,0.1,0.1,0.1), oma = c(0, 4, 4, 0))

for(i in slant){
  plot(dsn(x,
           xi = location,
           omega = scale,
           alpha = i),
       axes = FALSE,
       frame.plot=TRUE,
       type = "l",
       xlab = NA, ylab = NA)
  for(j in N){
    qqplot_creator(i, j)
  }
}
mtext(text="Distribution              N=5                N=10                   N=20                    N=40",
      side = 3,
      outer = TRUE)
mtext(text="slant = 100   slant = 10       slant = 2      slant = 0",
      side = 2,
      outer = TRUE)

Conclusion

When N is larger, the QQ-plot fits the y=x line better, which means the CLT works better for approximating the distribution. When the slant is larger, meaning the Skew-Normal distribution has higher skewness, it becomes more difficult to approximate the distribution.

Central Limit Theorem - Approximation

Probability and Statistical Inference - 09

Situation Description

Plot preparation

QQplot generation

Conclusion

CATALOG

FEATURED TAGS

FRIENDS