Statistical Programming, Simulation and Modeling

Dr. Basim Alsaedi

2025-04-12

Introduction

What is Statistical Programming?

Statistical programming combines:

To perform tasks like:

Why Learn Statistical Programming?

Probability Distributions and Simulation

Why Simulate Data?

Simulation helps us:

Normal Distribution

The normal distribution has the familiar bell shape:

\[f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{- \frac{(x - \mu)^{2}}{2\sigma^{2}}}\]

Simulating Normal Data in R

# Generate random values from a normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)

# Plot a histogram
hist(normal_data, breaks = 30, col = "skyblue", border = "white",
     main = "Normal Distribution",
     xlab = "Value", prob = TRUE)

# Add theoretical density curve
curve(dnorm(x, mean = 0, sd = 1), add = TRUE, col = "red", lwd = 2)

Normal Distribution: Real-world Example

Human Heights:

Binomial Distribution

Models the number of successes in fixed trials:

\[P(X = k) = \binom{n}{k}p^{k}(1 - p)^{n - k}\]

Simulating Binomial Data in R

# Generate binomial random values
binomial_data <- rbinom(1000, size = 10, prob = 0.3)

# Plot the distribution
barplot(table(binomial_data)/1000, col = "salmon", 
        main = "Binomial Distribution (n = 10, p = 0.3)",
        xlab = "Number of Successes", ylab = "Probability")

# Add theoretical probabilities
points(0:10, dbinom(0:10, size = 10, prob = 0.3), 
       pch = 16, col = "blue")
lines(0:10, dbinom(0:10, size = 10, prob = 0.3), 
      col = "blue", lwd = 2)

Binomial Distribution: Real-world Example

Coin Flips:

Poisson Distribution

Models events in fixed time/space intervals:

\[P(X = k) = \frac{\lambda^{k}e^{- \lambda}}{k!}\]

Simulating Poisson Data in R

# Generate Poisson random values
poisson_data <- rpois(1000, lambda = 3)

# Plot the distribution
barplot(table(poisson_data)/1000, col = "lightblue", 
        main = "Poisson Distribution (λ = 3)",
        xlab = "Number of Events", ylab = "Probability")

# Add theoretical probabilities
points(0:max(poisson_data), dpois(0:max(poisson_data), lambda = 3), 
       pch = 16, col = "red")
lines(0:max(poisson_data), dpois(0:max(poisson_data), lambda = 3), 
      col = "red", lwd = 2)

Poisson Distribution: Real-world Example

Call Center Arrivals:

Uniform Distribution

Equal probability across all values in a range:

\[f(x) = \frac{1}{b - a}, \quad a \leq x \leq b\]

Simulating Uniform Data in R

# Generate uniform random values
uniform_data <- runif(1000, min = 0, max = 1)

# Plot a histogram
hist(uniform_data, breaks = 20, col = "lightgoldenrod", border = "white",
     main = "Uniform Distribution (min = 0, max = 1)",
     xlab = "Value", prob = TRUE)

# Add the theoretical density line
abline(h = 1, col = "red", lwd = 2)

Exponential Distribution

Models time between events:

\[f(x) = \lambda e^{-\lambda x}, \quad x \geq 0\]

Simulating Exponential Data in R

# Generate exponential random values
exponential_data <- rexp(1000, rate = 0.5)

# Plot a histogram
hist(exponential_data, breaks = 30, col = "thistle", border = "white",
     main = "Exponential Distribution (rate = 0.5)",
     xlab = "Value", prob = TRUE)

# Add the theoretical density curve
curve(dexp(x, rate = 0.5), from = 0, to = 10, add = TRUE, 
      col = "purple", lwd = 2)

Exponential Distribution: Real-world Example

Customer Service Times:

Monte Carlo Simulation

What is Monte Carlo Simulation?

Monte Carlo methods use repeated random sampling to:

  1. Define the domain of possible inputs
  2. Generate inputs randomly from the domain
  3. Perform computations using these inputs
  4. Aggregate the results

Applications of Monte Carlo Simulation

Estimating Probabilities: Example 1

Probability of rolling a sum of 7 with two dice:

Monte Carlo for Probability: Code

# Function to simulate rolling two dice
roll_two_dice <- function() {
  sum(sample(1:6, 2, replace = TRUE))
}

# Run many simulations
rolls <- replicate(10000, roll_two_dice())

# Estimate the probability
prob_sum_7 <- mean(rolls == 7)
cat("Estimated probability:", prob_sum_7, "\n")
cat("Theoretical probability:", 6/36, "\n")

Estimating π Using Monte Carlo

Using random points in a square to estimate π:

Monte Carlo Integration

Numerical integration using random numbers:

Estimating Expected Values

Monte Carlo for expected values of complex distributions:

Portfolio Returns Simulation

Portfolio Returns: Key Statistics

For a portfolio with 60% stocks, 35% bonds, and 5% cash:

## Expected portfolio return: 6.36 %
## Portfolio risk (standard deviation): 11.2 %
## Probability of negative return: 28.61 %

Advanced Topics

Central Limit Theorem

As sample size increases, the distribution of sample means approaches a normal distribution:

Effects of Outliers on Regression

Regression Coefficients Comparison

## Clean data model: y = 1.93 + 2.99 x
## Outlier data model: y = 1.43 + 3.14 x

A single outlier dramatically changed the regression line!

Bootstrap Resampling

Estimating sampling distributions by resampling the original data:

Bootstrap: Key Results

For our original sample:

## Original sample mean: 81.33333
## 95% bootstrap confidence interval: 76.25 to 86.08333

Practice Exercise: Basic Probability

Problem: What’s the probability of at least one 6 in three dice rolls?

Practice Exercise: Confidence Intervals

How sample size affects confidence interval width:

Conclusion

Key Takeaways

Further Reading

  1. Ross, S. M. (2013). Simulation. Academic Press.
  2. Robert, C. P., & Casella, G. (2010). Monte Carlo Statistical Methods. Springer.
  3. Rizzo, M. L. (2019). Statistical Computing with R. CRC Press.
  4. Wickham, H., & Grolemund, G. (2017). R for Data Science. O’Reilly Media.

Thank You!

Questions?