A sample of n random variables (X1,X2....Xn) is taken from the population. If the population distribution is not normally distributed, then the sample size should be greater or equal to 30 (n>=30). Consider each r.v. to be independent and identically distributed. The sample average and sum are as follows.

The **Central Limit Theorem** states that the sampling distribution of the average (or sum) of a large number of samples will follow a normal distribution regardless of the original population distribution.

Say the population distribution has mean μ and standard deviation σ. Then,

This concept is best visualized. Let’s begin with a population that has a normal distribution.

# Population: Normal

```
pop_norm <- rnorm(10000, mean=10, sd=1)
hist(pop_norm, main = "Normal Distribution with mu=10", border="darkorange2")
```

The population mean is μ = 10, and standard deviation σ = 1. According to the CLT, if we take sample sizes of 100, then the sampling distribution for averages should become x̄ ~ N(μ, σ / √(n)) ~ N(10,0.1). The sampling distribution for the sum should become T ~ N(nμ, σ√n) ~ N(1000, 10).

```
n_sam_vec <- c() #create empty vector for the sampling distribution
```**for** (i in 1:10000){ #10000 simulations
n_mean<-mean(rnorm(100,10,1)) #take the AVERAGE of sample of 100 r.v.
n_sam_vec<-c(n_sam_vec,n_mean)} #add this to the sampling distribution vector
hist(n_sam_vec,freq=F, #graph the sampling distribution
col="orange",
main="Histogram of Sample Means")

`mean(n_sam_vec) #mean should be approx. 10 by CLT`

## [1] 10.00217

`sd(n_sam_vec) #SD should be approx. 0.1 by CLT`

## [1] 0.100377

```
n_sum_vec <- c()
```**for** (i in 1:10000){
n_sum<-sum(rnorm(100,10,1)) #take the TOTAL of sample of 100 r.v.
n_sum_vec<-c(n_sum_vec,n_sum)}
hist(n_sum_vec,freq=F,
main="Histogram of Sample Totals")
line_fit<-seq(950,1050,by=0.001)
lines(line_fit,dnorm(line_fit,1000,10),col="orange")

`mean(n_sum_vec) #mean should be approx. 1000 by CLT`

## [1] 999.9817

`sd(n_sum_vec) #SD should be approx. 10 by CLT`

## [1] 9.995104

# Population: Exponential

```
exp_seq <-seq(0,5,0.001) #sequence from 0 to 5 by .001
plot(exp_seq, dgamma(exp_seq,1,2), col="steelblue2", main="Exponential Distribution with λ = 2")
```

The population mean is 1/λ = 0.5, and standard deviation σ = 0.5. According to the CLT, if we take sample sizes of 100, then the sampling distribution for averages should become x̄ ~ N(μ, σ / √(n)) ~ N(0.5,0.05). The sampling distribution for the sum should become T ~ N(nμ, σ√n) ~ N(50, 5).

```
e_sam_vec <- c() #create empty vector for the sampling distribution
```**for** (i in 1:10000){ #10000 simulations
s_mean<-mean(rgamma(100,1,2)) #take the AVERAGE of sample of 100 r.v.
e_sam_vec<-c(e_sam_vec,s_mean)} #add this to the sampling distribution vector
hist(e_sam_vec,freq=F, #graph the sampling distribution
col="steelblue2",
main="Histogram of Sample Means")

`mean(e_sam_vec) #mean should be approx. .5 by CLT`

## [1] 0.5002097

`sd(e_sam_vec) #SD should be approx. .05 by CLT`

## [1] 0.05096455

```
e_sum_vec <- c()
```**for** (i in 1:10000){
e_sum<-sum(rgamma(100,1,2)) #take the TOTAL of sample of 100 r.v.
e_sum_vec<-c(e_sum_vec,e_sum)}
hist(e_sum_vec,freq=F,
main="Histogram of Sample Totals")
line_fit<-seq(30,75,by=0.001)
lines(line_fit,dnorm(line_fit,50,5),col="blue")

`mean(e_sum_vec) #mean should be approx. 50 by CLT`

## [1] 50.08411

`sd(e_sum_vec) #SD should be approx. 5 by CLT`

## [1] 5.036488

# Population: Uniform

```
pop_unif <- runif(10000, min=0, max=6)
hist(pop_unif, main = "Uniform Distribution with a=0, b=6", border="darkgreen")
```

The population mean is (a+b)^2 = 3 and standard deviation √((b−a)^2 / 12)) = 1.732. According to the CLT, if we take sample sizes of 100, then the sampling distribution for averages should becomex̄ ~ N(μ, σ / √(n)) ~ N(3,0.1732). The sampling distribution for the sum should becomeT ~ N(nμ, σ√n) ~ N(300, 17.32).

```
u_sam_vec <- c() #create empty vector for the sampling distribution
```**for** (i in 1:10000){ #10000 simulations
u_mean<-mean(runif(100,0,6)) #take the AVERAGE of sample of 100 r.v.
u_sam_vec<-c(u_sam_vec,u_mean)} #add this to the sampling distribution vector
hist(u_sam_vec,freq=F, #graph the sampling distribution
col="green",
main="Histogram of Sample Means")

`mean(u_sam_vec) #mean should be approx. 3 by CLT`

## [1] 2.999125

`sd(u_sam_vec) #mean should be approx. 0.1732 by CLT`

## [1] 0.1754574

```
u_sam_vec <- c() #create empty vector for the sampling distribution
```**for** (i in 1:10000){ #10000 simulations
u_mean<-sum(runif(100,0,6)) #take the AVERAGE of sample of 100 r.v.
u_sam_vec<-c(u_sam_vec,u_mean)} #add this to the sampling distribution vector
hist(u_sam_vec,freq=F, #graph the sampling distribution
main="Histogram of Sample Means")
line_fit<-seq(220,400,by=0.001)
lines(line_fit,dnorm(line_fit,300,17.32),col="green")

`mean(u_sam_vec) #mean should be approx. 300 by CLT`

## [1] 299.8411

`sd(u_sam_vec) #mean should be approx. 17.32 by CLT`

## [1] 17.17556