Chapter 2 HYPOTHESIS TESTING

2.1 Introduction

Hypothesis testing is a standard statistical procedure for deciding between two competing alternatives or scenarios, based on evidence presented by a data sample collected for the purpose of conducting the test. Most descriptive statistics such as mean, median and standard deviation are usually computed from data samples which may vary with the whole population. For instance, we want to find the concentration of hexavalent chromium, a human carcinogen, in soil. We will collect random 20 samples. The average concentration of the carcinogen from the collected samples will at most vary with the average concentration in the entire land, that’s where hypothesis testing comes in.

Basic terminology and procedure for hypothesis testing

There is a need to understand the basic terms before diving deep into the process of hypothesis testing. If we have historical data, the presence of a trend must be established and below are the assumptions that should be made prior to conducting hypothesis testing;

  1. The data is assumed to belong to a specified distribution(typically normal distribution).
  2. The data has a stable or stationary trend(free from of temporal or spatial trend).
  3. There are no unusual observations or outliers.

The hypothesis testing that involves data samples that holds the above assumptions is described as parametric hypothesis testing. Otherwise, if the above assumptions are not met, the test is known as non-parametric hypothesis testing.

  • Null hypothesis (\(H_0\)): Is a statement of the presumed default or baseline condition. In this case the null hypothesis is that there is no difference between the carcinogenic concentration is below the allowed limit.

  • Alternate hypothesis (\(H_1\) or \(H_A\)): is the inverse or the opposite of the null hypothesis. Alternate hypothesis contradicts the statement of the presumed default. In this case, the alternate hypothesis states that the carcinogenic level is above the allowed limit.

The evidence produced by the sample after conducting hypothesis testing can either refute or support the null hypothesis.

  • The one-tailed (One-Sided) Hypothesis test checks for an effect in only one direction. In our case, we test if the carcinogen levels exceed the allowed limit(right side). Alternatively, it can be checked of it is below the allowed limits (left-sided).

  • Upper-Tailed Hypothesis Test is the same as the right one-tailed hypothesis test that checks if the value of interest exceed the null hypothesis value

  • Lower-Tailed Hypothesis Test is the left one-tailed hypthesis test where it checks if the value of interest is below the null hypothesis value. In our case the allowed limit for carcinogens in soil.

  • Two-Tailed Hypothesis Test checks for the effect in both directions. In our we can change to find if the carcinogen levels are below or above the allowed limit.

  • T-Value is the test statistic calculated from the sample data to determine how far the sample mean is from the hypothesized mean, in terms of standard error.

  • Critical T-Value is the threshold value for the t-distribution that corresponds to the significance level. If the t-value exceeds the critical t-value, the null hypothesis is rejected.

  • Significant level(\(\alpha\)) is the probability threshold for rejecting the null hypothesis, usually set to 0.05(5%).

In most cases if the probability of observing the test statistic (p-value) is less than 0.005, the null hypothesis is rejected.

  • Degrees of Freedom(df) are the number of independent values in the data sample that are free to vary when calculating a statistic.

In most cases df is calculated as \[df = n - 1\]

where \(n\) represents the sample size(count).

In our cases where we have 20 samples therefore \(df = 20 -1\) and the degree of freedom will be 19

  • P-Value indicates the probability of obtaining a test statistic at least as extreme as the observed one, assuming the null hypothesis is true.

Below are the steps taken in hypothesis testing;

  1. Formulate the null and alternate hypothesis.
  2. Assume the null hypothesis is true and compute a test statistic that follows a known distribution.
  3. Calculate a test statistic value based on the data sample which follows a known distribution and determine the critical value for chosen distribution at a given significance level.
  4. Compare the test statistic to the critical value to assess the plausibility of the null hypothesis. For symmetrical distributions(e.g t, normal), critical values are located on;
  • upper-tailed test: positive side of the distribution
  • lower-tailed test: negative side of the distribution
  • two-tailed test: both sides, with equal magnitude
  1. Calculate the p-value to find how the probability of obtaining a test statistic more extreme than the observed value.

Try it!

To get the steps above clear, lets work on an example. This is an hypothetical problem where a regulatory body sets that the acceptable phosphorous concentration in a lake at 1.5mg/L. 30 sample measurements were collected at different locations in the lake. The objective was to determine if the phosphorous levels are significantly higher than the allowed limit.

Lets simulate the data.

set.seed(123) # for reproducibility

phosphorous_levels <- rnorm(n = 30,
                            mean = 1.7,
                            sd = 0.2)
  • Formulate the null hypothesis

    • Null Hypothesis(\(H_0\)): \(\mu\) < 1.5 - phosphorous levels are within the acceptable levels.
    • Alternate Hypothesis (\(H_1\)): \(\mu\) > 1.5 - phosphorous levels exceed the acceptable limit.

This is a right-sided(one-sided) test since we are only interested in detecting an increase in phosphorous levels

  • We use a one-sample t-test since the population data standard deviation is unknown, and the sample size is small(n=30). The formula for t-statistic is calculated as; \[t = {{\overline{x} - \mu_0}\over{s/\sqrt{n}}}\] Where:

    • \(\overline{x}\) is the sample mean.
    • \(\mu_0\) is the population mean under null hypothesis (1.5mg/L)
    • \(s\) is the sample standard deviation.
    • \(n\) is the sample size.
  • Perform the one-sample t-test

# One sample t-test 
t_test_result <- t.test(phosphorous_levels, 
                        mu = 1.5,
                        alternative = "greater")

# Display the results 
print(t_test_result)
## 
##  One Sample t-test
## 
## data:  phosphorous_levels
## t = 5.3201, df = 29, p-value = 5.21e-06
## alternative hypothesis: true mean is greater than 1.5
## 95 percent confidence interval:
##  1.629713      Inf
## sample estimates:
## mean of x 
##  1.690579
  • Determine the critical value

    • For the right-tailed test at a 5% significance levels(\(alpha\)=0.05), the critical value is the t-value that corresponds to the top 5% of the t-distribution. The qt() function will be used to calculate this; \[t_{critical} = qt(1-\alpha, df = n - 1)\]
critical_t <- qt(0.95, df = length(phosphorous_levels) - 1)

critical_t
## [1] 1.699127

The t-statistic is greater than the critical t therefore the null hypothesis is rejected. Lets confirm this by also computing the p_value

  • Compute the p-value
p_value <- t_test_result$p.value
p_value
## [1] 5.210072e-06

The null hypothesis is rejected as the p_value is less than 0.05. Therefore, we make conclusion that the phosphorous levels in the lake exceed the acceptable levels.

2.2 Parametric tests

As described before parametric tests rely on the assumptions that the data follows a known distribution (commonly the normal distribution) and it should be free of major outliers. In case where the test requires the data to follow a normal distribution, if the sample size is small it must follow a normal distribution while for larger sample size (e.g n≥30) the Central Limit Theorem allows the researchers to approximate normality even of the data is not perfectly normal

2.2.1 Parametric single-sample test

This type of test is used to determine whether the mean of a single sample is significantly different from a known or hypothesized population mean. It assumes that the sample data comes from a population that follows a normal distribution (or has a large enough sample size to approximate normality).

The main objective of this test is to test whether the sample mean differs significantly from a reference or target value, such as an industry standard, a population average, or a theoretical expectation in hypothesis testing

These are the steps taken for One-Sample T-test;

  1. State the Hypotheses:
  • Null Hypothesis(\(H_0\)): The sample mean equals the population mean (\(\mu = \mu_0\))
  • Alternate Hypothesis(\(H_A\)): The sample mean is different from the population mean(\(\mu \neq \mu_0\))
  1. Compute the test statistic

The formula for test statistic is \[t = {{\overline x - \mu_0}\over{s/\sqrt n}}\]

Where:

  • \(\overline x\): is the sample mean
  • \(\mu_0\): is the hypothesized population mean
  • \(s\): is the sample standard deviation
  • \(n\): is the sample size
  1. Determine the degree of freedom as \(df = n- 1\)
  2. Compare \(t\) to critical value or use p-value.

2.2.2 Parametric two-sample test

Lets understand what is two-sample test, before specifically diving into the parametric two sample test; Two-sample tests are essential statistical tools used to compare two different populations to determine if there are significant differences between them. These tests are widely applied in ecology to compare environmental factors such as contaminant levels, species diversity, or soil nutrient concentrations across different locations or time periods.

Two-sample tests help us compare the means, medians, or other characteristics of two groups. For example, scientists might want to know if the average soil arsenic concentration in an industrial area is higher than that in a nearby natural reserve. To perform these tests, data must be collected from both groups being compared.

Here are the assumptions of two-sample tests;

  1. Independence of Data Values: Data points should not influence each other.
  2. No trends in data: data should not show patterns over time for instance increasing and decreasing trends. If trends exist, adjustments such as pairing or deseasonalization should be considered.

In this case, we are focusing on the parametric tests – a parametric two-sample test is carried when the data follows a normal distribution. There are two types of parametric two-sample tests; independent and paired sample tests.

  1. Independent Two-Sample Test

These tests are used when the two groups being compared have no inherent connection. For example:

  • Comparing soil arsenic levels up-gradient vs. down-gradient of an industrial facility.
  • Measuring PCB concentrations upstream and downstream of a river.

In these cases, since the samples are collected from different locations without any relationship, an independent test is appropriate. These are the steps taken when carrying out independent tests;

  • Calculate the test statistic based on the difference between sample means.
  • Compare the test statistic with a critical value from the t-distribution.
  • If the test statistic is extreme or the p-value is less than the significance level (e.g., 0.05), reject the null hypothesis.
  1. Paired Two-Sample Tests

Paired tests are used when the samples are related or dependent on each other. This often happens when measurements are taken from the same location at different times. Examples include:

  • Before and After Cleanup: Measuring soil contaminant levels at a site before and after remediation.
  • Duplicate Sample Analysis: Sending identical samples to two different laboratories to assess the consistency of results.

Before carrying out paired two-sample test it is assumed that the data is paired(each observation in one group corresponds to an observation in the other group), differences between pairs should follow a normal distribution and no extreme outliers should be present.

Here are the steps taken when performing paired two-sample test;

  • Calculate the differences between paired observations.
  • Compute the mean and standard deviation of the differences
  • Calculate the t-statistic
  • Compare the test statistic with the critical value or p-value
  • If the test statistic is extreme or the p-value is less than the significance level, reject the null hypothesis.

Here is the formula for calculating the t-statistic; \[{t_0} = {{\overline D}\over{S_D/\sqrt n}}\]

where;

  • \(\overline D\) and \(S_D\) is the mean and standard deviation of the differences between the data samples of the two populations
  • \(n\) is the sample size

The null hypothesis \(H_0\) is that the population mean difference between the two populations, $_D $ is zero

Imagine an ecologist wants to determine if the cleanup of an industrial site has effectively reduced contaminant levels. By comparing pre- and post-cleanup soil samples at the same locations, they can assess the success of remediation efforts using a paired two-sample test. Similarly, comparing background levels with site concentrations using an independent two-sample test can help identify pollution sources.

Try it!

Arsenic contamination in groundwater is a significant ecological and public health concern. Prolonged exposure to arsenic can lead to severe health issues, including cancer and organ damage. To evaluate arsenic contamination in different water sources, we will apply parametric hypothesis tests. You will be provided with arsenic.csv that will be used to analyze the arsenic concentrations in different sources by parametric tests.

  • let’s test whether arsenic concentrations at Site A exceed the WHO limit of \(10 \mu g/L\) using single-sample test, First, foumulate the null and alternate hypothesis;

    • Null Hypothesis (H₀): Mean concentration = \(10 \mu g/L\)
    • Alternative Hypothesis (H₁): Mean concentration > \(10 \mu g/L\)(one-tailed test)
# Load the data 
arsenic_data <- read.csv("data/arsenic.csv")

site_A = c(arsenic_data$Site_A)

# Single sample t.test
t.test(site_A, mu = 10, alternative = "greater")
## 
##  One Sample t-test
## 
## data:  site_A
## t = 5.2797, df = 14, p-value = 5.819e-05
## alternative hypothesis: true mean is greater than 10
## 95 percent confidence interval:
##  11.5359     Inf
## sample estimates:
## mean of x 
##  12.30477

The p-value is less than 0.05 therefore the null hypothesis is rejected. It is concluded that the arsenic concentration in Site A exceeds the WHO limit.

  • Apply parametric independent two-sample test to compare arsenic levels between two independent water sources (Site A vs. Site B). Formulating the hypothesis;

    • Null hypothesis: Mean concentration at Site A = Mean concentration at Site B.
    • Alternate hypothesis: Mean concentration at Site A ≠ Mean concentration at Site B.

We assume that both samples are normally distributed and have equal variances.

# Declare variables 
site_A = c(arsenic_data$Site_A)
site_B = c(arsenic_data$Site_B)

# Perform t-test
t.test(site_A, site_B, var.equal = TRUE)  # Two-tailed test
## 
##  Two Sample t-test
## 
## data:  site_A and site_B
## t = 5.324, df = 28, p-value = 1.144e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  2.336700 5.259205
## sample estimates:
## mean of x mean of y 
## 12.304769  8.506816

The p-value is less than the 0.05 therefore the null hypothesis is rejected. It is concluded that Mean concentration at Site A ≠ Mean concentration at Site B

  • Use the paired t-test to compare arsenic levels at the same sites before and after remediation to determine if the intervention was effective. The question here is, “Did arsenic levels significantly decrease after remediation?”. Lets formulate the hypothesis before answering the above question;

    • Null Hypothesis (H₀): Mean before = Mean after.
    • Alternative Hypothesis (H₁): Mean before > Mean after (one-tailed test)
# Declare variables 
before_remediation = c(arsenic_data$Before_Remediation)
after_remediation = c(arsenic_data$After_Remediation)

# Perform t-test
t.test(before_remediation, after_remediation, paired = TRUE, alternative = "greater")
## 
##  Paired t-test
## 
## data:  before_remediation and after_remediation
## t = 9.7991, df = 14, p-value = 6.002e-08
## alternative hypothesis: true mean difference is greater than 0
## 95 percent confidence interval:
##  1.69087     Inf
## sample estimates:
## mean difference 
##        2.061387

Conclusion: p-value < 0.05, we reject H₀, meaning that remediation significantly reduced arsenic levels.

Here are the important considerations when performing parametric two sample test;

  1. If the data is not normally distributed, applying a logarithmic transformation can help, but it should be done carefully to avoid misinterpretation.
  2. These tests are often conducted using statistical software such as R, Python, or Excel for ease and accuracy.

Practical exercises

Industrial fertilizer use can increase organic carbon levels in groundwater, potentially impacting water quality and ecosystem health. In this exercise, you will use three parametric tests to answer the following questions:

  • Single-Sample Test: Is the average organic carbon concentration in background (control) wells significantly different from a regulatory threshold (e.g., 25 ppm)?
  • Paired Two-Sample Test: Is there a significant difference in organic carbon concentration between cleaned water and before?
  • Independent Two-Sample Test:

Your are provided with water portability data set that can be downloaded from here


Solution

________________________________________________________________________________

2.3 Nonparametric test

2.3.1 Nonparametric One-Sample Wilcoxon Signed-Rank Test

The Wilcoxon Signed-Rank (WSR) Test is a nonparametric test used to determine whether the median of a population differs from a fixed reference value. Unlike the sign test, which only considers the direction of differences, the WSR test accounts for the magnitude of deviations as well.

Here are the asssumptions to considered when performing the test;

  • The data values are independent.
  • The underlying population is symmetrically distributed (but not necessarily normal) around the median.
  • The number of tied values should be minimal.
  • The test cannot handle nondetect (ND) values because it requires actual magnitudes for ranking.

Just follow the procedures outline below to perform the test;

  1. Compute the deviations from the reference value.
  2. Rank the absolute values of the deviations (smallest gets rank 1, next smallest gets rank 2, etc.).
  3. Assign average ranks in case of tied values.
  4. Sum the ranks corresponding to positive deviations.
  5. Compare the test statistic with a critical value (from tables) or use a normal approximation if sample size is large (n > 20).
  6. Compute a p-value and determine statistical significance.

Try it!

Suppose we have measured pollutant concentrations at a site and want to test whether the median concentration is different from a standard of 50 ppm.

Lets formulate the null and the alternate hypothesis before carrying out the test.

  • Null Hypothesis(\(H_0\)): The median of the population is equal to the reference value.
  • Alternate Hypothesis(\(H_A\)): The median is different from the reference value.

Now lets perform the test, where it will return the W#-statistic and p-value if:

  • If p < 0.05, we reject H₀, indicating that the median is significantly different from 50 ppm.
  • If p > 0.05, we fail to reject H₀, meaning there is no strong evidence to say the median differs from 50 ppm.
# Load required package
set.seed(123)
data_values <- c(45, 55, 60, 52, 48, 51, 53, 47, 49, 54)
reference_value <- 50

# Perform Wilcoxon Signed-Rank Test
wilcox.test(data_values, mu = reference_value, alternative = "two.sided")
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  data_values
## V = 36, p-value = 0.4136
## alternative hypothesis: true location is not equal to 50

Note that!

  • The test does not work well if the data contain ND values (nondetects), since exact magnitudes are needed.
  • Works best when the data are symmetrically distributed
  • For large samples (n > 20), the normal approximation is used instead of exact critical values.

The Wilcoxon Signed-Rank Test is a useful nonparametric alternative to the one-sample t-test when normality is questionable. It provides a robust way to assess median differences while taking data magnitudes into account.

Practical Exercise

Acid rain is a major environmental concern, particularly in industrial areas. In our study, we are interested in understanding the effect of emission controls on rainwater acidity. Rainwater pH values are recorded before and after the intervention at the same 15 monitoring stations. Because pH measurements are continuous but the data set is relatively small and may not meet the assumptions of parametric tests, nonparametric methods offer a robust alternative.

You are required to perform to use the acid_rain data set provided by the instructor to perform Nonparametric One-Sample Wilcoxon Signed-Rank Test. The objective of this project is to test whether the median rainwater pH before the intervention is different from the regulatory standard of 5.5.


Solution

Project Objective: Test whether the median rainwater pH before the intervention is different from the regulatory standard of 5.5.

Hypotheses:

  • Null Hypothesis: The median pH before intervention is 5.5
  • Alternative Hypothesis: The median pH before intervention is not 5.5.

Mathematical Concept:

For each observation, compute the difference: \[d_i = pH_i - 5.5\] Then rank the absolute differences \(|d_i|\) (ignoring zeros) and sum the ranks for positive differences. The Wilcoxon test statistic \(W\) is compared against its distribution to determine significance.

# Load necessary library
library(dplyr)

# Read the dataset (adjust the URL as needed)
acid_data <- read.csv("data/acid_rain.csv")

# View the first few rows
#head(acid_data)

# Nonparametric One-Sample Wilcoxon Signed-Rank Test on 'Before' pH values
# Test if the median pH before intervention is 5.5
wilcox_test_before <- wilcox.test(acid_data$Before, mu = 5.5, alternative = "two.sided")
print(wilcox_test_before)
## 
##  Wilcoxon signed rank exact test
## 
## data:  acid_data$Before
## V = 67, p-value = 0.7197
## alternative hypothesis: true location is not equal to 5.5

p-value is above 0.05 therefore the null hypothesis is accepted where we conclude that the median pH before the intervention is 5.5.

________________________________________________________________________________

2.3.2 Nonparametric two-sample paired sign test and paired Wilcoxon Signed Rank Test

The Wilcoxon Rank Sum Test (also known as the Mann–Whitney U test or the Wilcoxon–Mann–Whitney test) is a nonparametric procedure that tests for a difference between two population medians. The test is used to determine whether one population consistently produces larger or smaller measurements than another, assuming that the two populations have approximately equal variance.

Here are the assumptions before carrying out the test;

  • The two samples are independent.
  • The data distributions of the two populations are similar in shape (same variance), though not necessarily the same mean or median.
  • The test does not assume a specific distribution (e.g., normal, lognormal, gamma, etc.).
  • The sample size for each group should preferably be at least 20, but meaningful results can still be obtained with samples as small as 10 using refined algorithms.

Follow the steps below to carry out the tests;

  1. Combine the two samples into a single data set and rank the observations in ascending order.
  2. Compute the Wilcoxon test statistic as the sum of the ranks for one of the groups (usually the smaller group).
  3. If each sample size is , obtain the critical value from statistical tables.
  4. If each sample size is , compute the test statistic’s standardized form and compare it to the normal distribution.
  5. Compute the p-value from statistical tables or using functions in R such as pnorm().

If tied values exist;

  • the average rank is assigned to all tied values.
  • The presence of ties may require an adjustment to the test statistic calculation.

If there exist Non-Detects (NDs);

  • If all NDs share a single reporting limit, they can be treated as tied values.
  • If multiple reporting limits exist, alternative tests such as the Gehan test are preferred.

Try it!

Lets test between two groups

# Sample data
group1 <- c(3, 5, 9, 10, 15)
group2 <- c(2, 4, 6, 8, 12)

# Perform Wilcoxon Rank Sum Test
wilcox.test(group1, group2, alternative = "two.sided")
## 
##  Wilcoxon rank sum exact test
## 
## data:  group1 and group2
## W = 16, p-value = 0.5476
## alternative hypothesis: true location shift is not equal to 0

Exact testing using wilcox.exact()

install the exactRankTests

install.packages("exactRankTests")

Run the test

library(exactRankTests)
##  Package 'exactRankTests' is no longer under development.
##  Please consider using package 'coin' instead.
wilcox.exact(group1, group2, alternative = "two.sided")
## 
##  Exact Wilcoxon rank sum test
## 
## data:  group1 and group2
## W = 16, p-value = 0.5476
## alternative hypothesis: true mu is not equal to 0

Using coin package for more complex cases. First install the package install.packages("coin")

Run the test

library(coin)
## 
## Attaching package: 'coin'
## The following object is masked _by_ '.GlobalEnv':
## 
##     alpha
## The following objects are masked from 'package:exactRankTests':
## 
##     dperm, pperm, qperm, rperm
# wilcox_test(group1 ~ group2)

The Wilcoxon Rank Sum Test is a useful nonparametric alternative to the two-sample t-test when normality cannot be assumed. It is robust to outliers and does not require knowledge of the underlying data distribution. R provides multiple functions for computing the test, with options for handling ties and computing exact p-values.

Practical Exercise

Using the “acid rain” above determine if there is a consistent direction of change in pH values (i.e., whether more stations show an increase or decrease in pH after the intervention). Use the non-parametric paired sign test.

Repeat what you have done with Nonparametric Paired Wilcoxon Signed-Rank Test. However, in this case, test whether there is a statistically significant difference between the pH values before and after the intervention. _______________________________________________________________________ Solution

Project Objective: Is there a consistent direction of change in pH values (i.e., whether more stations show an increase or decrease in pH after the intervention).

Hypotheses

  • Null Hypothesis: There is no difference in the number of stations with increased versus decreased pH.
  • Alternate Hypothesis: There is a significant difference in the number of stations showing an increase compared to a decrease.

Methods

For each station, calculate:

$$ sign(d_i) = \[\begin{cases} 1, & \text{if } After - Before > 0 \\ 0, & \text{if } After - Before = 0 \\ -1, & \text{if } After - Before < 0 \end{cases}\]

$$ Count the number of positive differences and use a binomial test to see if it deviates from the expected 50% chance.

# Compute differences between After and Before pH values
acid_data <- acid_data %>% mutate(Diff = After - Before)

# Count number of positive changes (excluding ties)
positive_changes <- sum(acid_data$Diff > 0)
total_nonzero <- sum(acid_data$Diff != 0)

# Perform a binomial test assuming a 50% chance of an increase
sign_test <- binom.test(positive_changes, total_nonzero, p = 0.5)
print(sign_test)
## 
##  Exact binomial test
## 
## data:  positive_changes and total_nonzero
## number of successes = 10, number of trials = 15, p-value = 0.3018
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3838037 0.8817589
## sample estimates:
## probability of success 
##              0.6666667

p-value is above 0.05 therefore the null hypothesis is accepted. We will conclude that there is no difference in the number of stations with increased versus decreased pH.

Lets repeat the same for Nonparametric Paired Wilcoxon Signed-Rank Test

Project Objective : Test whether there is a statistically significant difference between the pH values before and after the intervention.

Hypotheses:

  • Null hypothesis: The median difference between paired pH values (After - Before) is zero.
  • Alternate hypothesis: The median difference is not zero.
# Paired Wilcoxon Signed-Rank Test
paired_wilcox <- wilcox.test(acid_data$Before, acid_data$After, paired = TRUE, alternative = "two.sided")
print(paired_wilcox)
## 
##  Wilcoxon signed rank exact test
## 
## data:  acid_data$Before and acid_data$After
## V = 53, p-value = 0.7197
## alternative hypothesis: true location shift is not equal to 0

The p-value is above -.-5 therefore there is no significant change in pH after the intervention.

________________________________________________________________________________

2.4 One-Way ANOVA

ANOVA (Analysis of Variance) is a statistical method used to compare three or more groups to determine whether their means or medians significantly differ, or if they belong to the same population. It is particularly useful in scenarios where multiple groups need to be analyzed simultaneously, reducing the risk of errors that arise from multiple two-sample tests. ANOVA can be categorized into parametric (assuming normality and equal variances) and nonparametric (distribution-free) methods. Variants include One-Way ANOVA (influenced by a single factor) and Two-Way or Multifactor ANOVA (considering multiple factors). While ANOVA helps identify overall differences, post hoc tests are often needed to determine which groups specifically differ

In this analysis we will focus on One-Way ANOVA, the parametric and non-parametric one.

2.4.1 Parametric One-Way ANOVA

One-Way ANOVA is a parametric statistical test used to determine whether there are significant differences among the means of three or more independent groups. It is commonly applied in various fields, including ecology, to analyze how a single factor influences a dependent variable, such as plant growth across different soil types.

Here are the assumptions that must be met before conducting One-Way ANOVA

  • Normality – Each group’s data should follow a normal distribution. This can be tested using the Shapiro-Wilk test or Kolmogorov-Smirnov test.
  • Homogeneity of Variance – The variance across groups should be approximately equal, verified using Levene’s test or Bartlett’s test.
  • Independence – Observations should be independent, meaning one measurement should not influence another.

If these assumptions are violated, alternatives such as Welch’s ANOVA (for unequal variances) or Kruskal-Wallis test (non-parametric alternative) should be considered.

This is how One-Way ANOVA works; It compares -

  • Between-Group Variance – Measures differences among the group means.
  • Within-Group Variance – Measures variability within each group.

A high F-statistic suggests that at least one group mean differs significantly. The test’s p-value is then used to determine statistical significance (typically, if p < 0.05, the null hypothesis is rejected).

One-Way ANOVA is important in Ecology as it can be used to find; - Differences in plant species diversity across multiple forest regions. - Variations in water pH levels among different lakes. - The effect of pollution levels on fish populations across multiple rivers.

For example, an ecologist studying wetland ecosystems might apply One-Way ANOVA to determine whether soil nitrogen levels significantly affect plant biodiversity across different wetland sites. If the test finds a significant difference, post hoc tests like Tukey’s HSD can identify which specific wetland sites differ.

By using One-Way ANOVA, ecologists and researchers can make informed, data-driven decisions about conservation efforts and environmental management strategies.

Try it!

In our case, we have have generated the sample data for three groups; group1, group2 and group3. Formulating the null hypothesis;

  • Null Hypothesis: The mean is the same across all gropu populations.
  • Alternate hypothesis: At least one group population has a different mean from the rest of the group.

When modelling the parametric one-way ANOVA, the F-statistic is computed as; \[{F} = {{\text Between-Groups Variance(MSG)}\over{\text Within-Group Variance(MSEE)}}\] Where;

  • \(MSG = {{SSBG}\over{k-1}}\), the \(SSBG\) is the sum of squares.
  • \(MSE = {{SSWG}\over{N-k}}\), the \(SSWG\) is the between group sum of squares.

\(k\) as the number of groups and \(N\) as the total number of observaions.

# Load necessary library
library(dplyr)
library(ggplot2)

# Generate sample data
set.seed(123)
group1 <- rnorm(10, mean = 50, sd = 10)
group2 <- rnorm(10, mean = 55, sd = 10)
group3 <- rnorm(10, mean = 60, sd = 10)

data <- data.frame(
  values = c(group1, group2, group3),
  group = rep(c("Group1", "Group2", "Group3"), each = 10)
)

# Compute group means and overall mean
means <- data %>% group_by(group) %>% summarise(mean_value = mean(values))
overall_mean <- mean(data$values)

# Compute sum of squares
SSBG <- sum(10 * (means$mean_value - overall_mean)^2) # Between-group sum of squares
SSWG <- sum((data$values - ave(data$values, data$group, FUN = mean))^2) # Within-group sum of squares
SSTotal <- sum((data$values - overall_mean)^2) # Total sum of squares

df_SSBG <- length(unique(data$group)) - 1
df_SSWG <- nrow(data) - length(unique(data$group))
df_SSTotal <- nrow(data) - 1

# Compute mean squares
MSG <- SSBG / df_SSBG
MSE <- SSWG / df_SSWG

# Compute F-statistic
F0 <- MSG / MSE

# Compute critical F value
alpha <- 0.05
F_critical <- qf(1 - alpha, df_SSBG, df_SSWG)

# Compute p-value
p_value <- 1 - pf(F0, df_SSBG, df_SSWG)

# Display results
cat("Between-group sum of squares (SSBG):", SSBG, "\n")
## Between-group sum of squares (SSBG): 223.5015
cat("Within-group sum of squares (SSWG):", SSWG, "\n")
## Within-group sum of squares (SSWG): 2568.336
cat("Total sum of squares (SSTotal):", SSTotal, "\n")
## Total sum of squares (SSTotal): 2791.837
cat("F-statistic (Fo):", F0, "\n")
## F-statistic (Fo): 1.174796
cat("Critical F value (Fc):", F_critical, "\n")
## Critical F value (Fc): 3.354131
cat("p-value:", p_value, "\n")
## p-value: 0.3241775
# Perform ANOVA using built-in function
anova_result <- aov(values ~ group, data = data)
summary(anova_result)
##             Df Sum Sq Mean Sq F value Pr(>F)
## group        2  223.5  111.75   1.175  0.324
## Residuals   27 2568.3   95.12

Practical Exercise

Air quality is an important ecological and public health concern. For example, understanding whether Nitrogen Dioxide (\(NO_2\)) concentrations differ significantly across different states can inform regulatory decisions and resource management. In this exercise, you will use the “Air Quality Data in India” dataset, which is publicly available on Kaggle, to test the following:

You are required to use the Parametric One-Way ANOVA to compare the means of the pollutant levels across different groups.


Solution

# Load the required libraries
library(dplyr)

# Load the data 
air_data <- read.csv("data/AirQualityDataIndia.csv")


# Perform one-way ANOVA
anova_result <- aov(no2 ~ state, data = air_data)
summary(anova_result)
##                 Df   Sum Sq Mean Sq F value Pr(>F)    
## state           33 53343341 1616465    7511 <2e-16 ***
## Residuals   419475 90281183     215                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 16233 observations deleted due to missingness

Although there were missing values in the data set, The F-statistic was 7511 and 33 degrees of freedom.

The p-value was far much less than 0.05 therefore there is enough evidence to reject the null hypothesis. We conclude that atleast one state had a different mean concentration of Nitrogen dioxide from the rest of the states

________________________________________________________________________________

2.4.2 Nonparametric One-Way ANOVA (Kruskal-Wallis Test)

When conducting a parametric one-way ANOVA, one of the key assumptions is that the residuals follow a normal distribution. If this assumption is violated, a common approach is to apply a logarithmic transformation—using either the natural logarithm (\(ln\)) or the base-10 logarithm (log10)—to the data. If this transformation successfully normalizes the data and satisfies the assumption of homogeneity of variances, then the ANOVA can be performed on the transformed values.

However, if both the original and log-transformed data fail to meet the assumptions required for parametric ANOVA, an alternative nonparametric test, the Kruskal–Wallis test, can be used. Unlike parametric ANOVA, the Kruskal–Wallis test does not assume normality but does require that the distributions of the populations have similar shapes, particularly regarding skewness and variance. This test also assumes that the data points are independent and do not exhibit trends.

If the presence of a trend affects the data, one approach is to use the Friedman test, which accounts for structured variability by analyzing data within blocks. Another option for handling seasonal or cyclical trends is deseasonalization before applying the Kruskal–Wallis test.

Kruskal–Wallis Test Procedure

The Kruskal–Wallis test examines differences in the medians of multiple groups rather than their means. It is performed by ranking all data values across groups and using the calculating a test statistic:

This statistic follows a chi-square() distribution with degrees of freedom, where is the number of groups. If exceeds the critical value at a chosen significance level(), or if the p-value is below 0.05, the null hypothesis of equal medians is rejected.

Try it!

Ecological Example: Testing Differences in Soil Nutrient Levels

Consider a study where researchers measure soil nitrogen levels in three different ecosystems: grassland, wetland, and forest. Since soil nutrient data often exhibit skewness, the Kruskal–Wallis test is a suitable choice.

# Load necessary library
library(dplyr)

# Simulated soil nitrogen data (in mg/kg)
set.seed(42)
grassland <- rnorm(10, mean = 30, sd = 5)
wetland <- rnorm(10, mean = 40, sd = 5)
forest <- rnorm(10, mean = 35, sd = 5)

# Combine into a data frame
data <- data.frame(
  Nitrogen = c(grassland, wetland, forest),
  Ecosystem = rep(c("Grassland", "Wetland", "Forest"), each = 10)
)

# Perform Kruskal-Wallis test
kruskal.test(Nitrogen ~ Ecosystem, data = data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Nitrogen by Ecosystem
## Kruskal-Wallis chi-squared = 4.2039, df = 2, p-value = 0.1222

Since the p-value (0.1222) is greater than 0.05, we fail to reject the null hypothesis. This suggests that there is no statistically significant difference in median soil nitrogen levels among the three ecosystems. If a lower p-value had been obtained, further analysis using post hoc pairwise comparisons would be required to determine which specific ecosystems differ.

Practical Exercise

Using the air quality data in India from the previous exercise, perform non-parametric one-way ANOVA(Kruskal-Wallis Test) to compare the medians when data assumptions for ANOVA might not hold.


Solution

Formulate the hypothesis;

  • Null hypothesis: The median Nitrogen dioxide concentration is the same across all states.
  • Alternate hypothesis: At least one state has a median Nitrogen dioxide concentration that is different.
# Perform the Kruskal-Wallis test
kruskal_result <- kruskal.test(no2 ~ state, data = air_data)
print(kruskal_result)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  no2 by state
## Kruskal-Wallis chi-squared = 166893, df = 33, p-value < 2.2e-16

The p-value is far much below 0.05 therefore the null hypothesis is rejected. It is concluded that at least one state has a median Nitrogen dioxide concentration that is different.

________________________________________________________________________________

2.5 Hands-on Exercise

You are required to analyze the forest health and ecological diversity by performing various hypothesis testing. The data set can be downloaded from here – use the forest_health_data_with_target.csv Use the . Here is the breakdown of some of the data set attributes;

  • Tree Height (m): The height of individual trees.
  • Diameter at Breast Height (DBH) (cm): The diameter of the tree trunk measured at breast height.
  • Species: The species of each tree.
  • Health Status: Categorized as ‘Healthy’ or ‘Unhealthy’.
  • Location: Geographical location of the trees.

The exercise will cover what we have just learnt above. Answer the questions below;

  1. Load the data set, and check for null values. If there is, handle them appropriately.
  2. Apply the parametric single-sample test to test if the mean tree height differs from a hypothesized population mean. Hypothesized mean = 15 meters.
  3. Perform the parametric independent two-sample t-tes to compare the mean DBH between ‘Healthy’ and ‘Unhealthy’ trees.
  4. Perform parametric one-way ANOVA to determine if there’s a significant difference in mean Tree Height among different health status.
  5. Using nonparametric one-way ANOVA(Kruskal-Wallis Test), test if there’s a significant difference in median DBH among different health status.

Solution

  1. Load the data set, and check for null values. If there is, handle them appropriately.
# Load the data 
data <- read.csv("data/forest_health/forest_health_data_with_target.csv")

sum(is.na(data)) # Check if there are missing values -- found none
## [1] 0
  1. Apply the parametric single-sample test to test if the mean tree height differs from a hypothesized population mean. Hypothesized mean = 15 meters.
  • Null hypothesis: The mean height of trees is equal to 15
  • Alternate hypothesis: The mean height of trees is not equal to 15.

Assumption: Tree Height data is normally distributed.

t.test(data$Tree_Height, mu=15)
## 
##  One Sample t-test
## 
## data:  data$Tree_Height
## t = 2.8797, df = 999, p-value = 0.004065
## alternative hypothesis: true mean is not equal to 15
## 95 percent confidence interval:
##  15.23272 16.22829
## sample estimates:
## mean of x 
##   15.7305

From the test, the p-value is less than 0.05 therefore the null hypothesis is rejected. It is concluded the mean tree height of the entire population is not equal to 15

  1. Perform the parametric independent two-sample t-test to compare the mean DBH between ‘Healthy’ and ‘Unhealthy’ trees.
  • Null hypothesis: The mean DBH between health and unhealthy trees is the same.

  • Alternate hypothesis: The mean DBH between health and unhealthy trees is the different

  • Assumptions: DBH is normally distributed within each group, and variances are equal.

# Load library
library(dplyr)

# Get only the 'Healthy' and 'Unhealthy' matching values in the variables
subset_data <- data %>%
  subset(Health_Status == "Healthy" | Health_Status == "Unhealthy")

# Perform the test 
t.test(DBH ~ Health_Status, data = subset_data, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  DBH by Health_Status
## t = 1.1018, df = 761, p-value = 0.2709
## alternative hypothesis: true difference in means between group Healthy and group Unhealthy is not equal to 0
## 95 percent confidence interval:
##  -1.763876  6.276335
## sample estimates:
##   mean in group Healthy mean in group Unhealthy 
##                54.37697                52.12074

The p-value is way above 0.05 therefore we fail to reject the null hypothesis. It is concluded the mean DBH between Healthy and unhealthy trees is the same.

  1. Perform parametric one-way ANOVA to determine if there’s a significant difference in mean Tree Height among different health status.
  • Null hypothesis: There is no significant different between the height among trees with different health status.

  • Alternate hypothesis: There is a significant different between the height among trees with different health status.

  • Assumptions: There is normality and homogeneity of variances across groups.

anova_result <- aov(Tree_Height ~ Health_Status, data = data)
summary(anova_result)
##                Df Sum Sq Mean Sq F value Pr(>F)    
## Health_Status   3  15544    5181   105.9 <2e-16 ***
## Residuals     996  48739      49                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The p-value is way much less than 0.05 therefore the null hypothesis is rejected. There is a significant height difference between trees with different health status.

  1. Using nonparametric one-way ANOVA(Kruskal-Wallis Test), test if there’s a significant difference in median DBH among different health status.

— Remember to provide a solution to the last question —————————————————–

________________________________________________________________________________