Dr. Basim Alsaedi
2025-04-12
Statistical programming combines:
To perform tasks like:
Simulation helps us:
The normal distribution has the familiar bell shape:
\[f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{- \frac{(x - \mu)^{2}}{2\sigma^{2}}}\]
# Generate random values from a normal distribution
normal_data <- rnorm(1000, mean = 0, sd = 1)
# Plot a histogram
hist(normal_data, breaks = 30, col = "skyblue", border = "white",
main = "Normal Distribution",
xlab = "Value", prob = TRUE)
# Add theoretical density curve
curve(dnorm(x, mean = 0, sd = 1), add = TRUE, col = "red", lwd = 2)
Human Heights:
Models the number of successes in fixed trials:
\[P(X = k) = \binom{n}{k}p^{k}(1 - p)^{n - k}\]
# Generate binomial random values
binomial_data <- rbinom(1000, size = 10, prob = 0.3)
# Plot the distribution
barplot(table(binomial_data)/1000, col = "salmon",
main = "Binomial Distribution (n = 10, p = 0.3)",
xlab = "Number of Successes", ylab = "Probability")
# Add theoretical probabilities
points(0:10, dbinom(0:10, size = 10, prob = 0.3),
pch = 16, col = "blue")
lines(0:10, dbinom(0:10, size = 10, prob = 0.3),
col = "blue", lwd = 2)
Coin Flips:
Models events in fixed time/space intervals:
\[P(X = k) = \frac{\lambda^{k}e^{- \lambda}}{k!}\]
# Generate Poisson random values
poisson_data <- rpois(1000, lambda = 3)
# Plot the distribution
barplot(table(poisson_data)/1000, col = "lightblue",
main = "Poisson Distribution (λ = 3)",
xlab = "Number of Events", ylab = "Probability")
# Add theoretical probabilities
points(0:max(poisson_data), dpois(0:max(poisson_data), lambda = 3),
pch = 16, col = "red")
lines(0:max(poisson_data), dpois(0:max(poisson_data), lambda = 3),
col = "red", lwd = 2)
Call Center Arrivals:
Equal probability across all values in a range:
\[f(x) = \frac{1}{b - a}, \quad a \leq x \leq b\]
# Generate uniform random values
uniform_data <- runif(1000, min = 0, max = 1)
# Plot a histogram
hist(uniform_data, breaks = 20, col = "lightgoldenrod", border = "white",
main = "Uniform Distribution (min = 0, max = 1)",
xlab = "Value", prob = TRUE)
# Add the theoretical density line
abline(h = 1, col = "red", lwd = 2)
Models time between events:
\[f(x) = \lambda e^{-\lambda x}, \quad x \geq 0\]
# Generate exponential random values
exponential_data <- rexp(1000, rate = 0.5)
# Plot a histogram
hist(exponential_data, breaks = 30, col = "thistle", border = "white",
main = "Exponential Distribution (rate = 0.5)",
xlab = "Value", prob = TRUE)
# Add the theoretical density curve
curve(dexp(x, rate = 0.5), from = 0, to = 10, add = TRUE,
col = "purple", lwd = 2)
Customer Service Times:
Monte Carlo methods use repeated random sampling to:
Probability of rolling a sum of 7 with two dice:
# Function to simulate rolling two dice
roll_two_dice <- function() {
sum(sample(1:6, 2, replace = TRUE))
}
# Run many simulations
rolls <- replicate(10000, roll_two_dice())
# Estimate the probability
prob_sum_7 <- mean(rolls == 7)
cat("Estimated probability:", prob_sum_7, "\n")
cat("Theoretical probability:", 6/36, "\n")
Using random points in a square to estimate π:
Numerical integration using random numbers:
Monte Carlo for expected values of complex distributions:
For a portfolio with 60% stocks, 35% bonds, and 5% cash:
## Expected portfolio return: 6.36 %
## Portfolio risk (standard deviation): 11.2 %
## Probability of negative return: 28.61 %
As sample size increases, the distribution of sample means approaches a normal distribution:
## Clean data model: y = 1.93 + 2.99 x
## Outlier data model: y = 1.43 + 3.14 x
A single outlier dramatically changed the regression line!
Estimating sampling distributions by resampling the original data:
For our original sample:
## Original sample mean: 81.33333
## 95% bootstrap confidence interval: 76.25 to 86.08333
Problem: What’s the probability of at least one 6 in three dice rolls?
How sample size affects confidence interval width:
Questions?