13 Model Evaluation in SDM

13.1 1. Introduction to Model Evaluation

13.1.1 Why Model Evaluation is Important in SDM

Model evaluation is a critical step in Species Distribution Modeling (SDM) because it determines how well a model predicts the distribution of species in different environmental conditions. Without proper evaluation, predictions may be misleading and lead to incorrect conservation decisions.

Model evaluation ensures that:

✅ The model is accurate – Predictions match real-world observations.
✅ The model generalizes well – Works on new/unseen data, not just the training set.
✅ Ecological validity is maintained – Predictions align with biological knowledge.

13.1.2 Key Objectives of Model Evaluation

1️⃣ Assessing Accuracy – How well does the model predict species presence/absence?
2️⃣ Avoiding Overfitting – Does the model generalize to unseen data?
3️⃣ Ecological Relevance – Are predictions biologically meaningful?
4️⃣ Comparing Multiple Models – Which algorithm (CART, RF, GBM, MaxEnt) performs best for the dataset?

13.1.3 Common Pitfalls in Model Evaluation

⚠️ Overfitting
- The model memorizes training data instead of learning true species-environment relationships.
- Fix: Use cross-validation and limit model complexity.

⚠️ Biased Datasets
- If presence records are clustered in sampled areas (e.g., near roads), the model may falsely predict that species prefer those areas.
- Fix: Use spatially explicit validation and account for sampling bias.

⚠️ Improper Thresholding
- Using the wrong threshold for presence/absence conversion can skew accuracy metrics.
- Fix: Experiment with different thresholding methods (e.g., maximum sensitivity-specificity, 10th percentile).

Pro Tip: Always check the ecological plausibility of the model. A high accuracy score does not mean the model makes biologically realistic predictions!

13.2 2. Performance Metrics for Model Evaluation

13.2.1 1. Classification Metrics (For Presence/Absence Models)

When species data is binary (presence = 1, absence = 0), we use classification metrics to measure how well the model differentiates between the two.

13.2.1.1 Accuracy – Overall correctness of predictions

Formula:
\[ Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \] ✅ Good for balanced datasets but misleading for imbalanced data (e.g., when species are rare).

13.2.1.2 Sensitivity (Recall) – Ability to correctly predict species presence

Formula:
\[ Sensitivity = \frac{TP}{TP + FN} \] ✅ Important for conservation because false negatives (FN) may ignore suitable habitats.

13.2.1.3 Specificity – Ability to correctly predict absence

Formula:
\[ Specificity = \frac{TN}{TN + FP} \] ✅ Useful in understanding over-predictions of species presence.

13.2.1.4 Kappa Statistic – Agreement Beyond Random Chance

Kappa adjusts accuracy to account for random chance agreement.

Formula:
\[ Kappa = \frac{Accuracy - Expected Accuracy}{1 - Expected Accuracy} \] ✅ More reliable than accuracy for imbalanced datasets.

13.2.2 2. Area Under the Curve (AUC-ROC & AUC-PR)

AUC is one of the most common evaluation metrics in species distribution models.

13.2.2.1 ROC Curve (Receiver Operating Characteristic Curve)

Plots True Positive Rate (Sensitivity) vs. False Positive Rate (1 - Specificity)
Higher AUC = Better ability to distinguish presence from absence.

✅ Works well for most classification tasks.

Interpreting AUC Scores:

AUC Score	Model Performance
0.5	Random Guessing
0.7 - 0.8	Fair
0.8 - 0.9	Good
0.9 - 1.0	Excellent

13.2.2.2 Precision-Recall (PR) Curve – Best for Rare Species

When species are rare, AUC-ROC may be misleading. Instead, use the Precision-Recall Curve, which evaluates model performance when absences outnumber presences.

Precision = How many predicted presences were actually correct?
Recall = Sensitivity (ability to detect all presences).

✅ Best when species presence is rare (e.g., endangered species).

13.2.3 3. True Skill Statistic (TSS)

TSS is an alternative to AUC that does not depend on prevalence (species rarity).

Formula:
\[ TSS = Sensitivity + Specificity - 1 \]

✅ Good for ecological models where presence/absence is not evenly distributed.
✅ Ranges from -1 (worse than random) to 1 (perfect prediction).

13.2.4 4. Root Mean Square Error (RMSE) for Continuous Models

When the model outputs continuous suitability values (e.g., habitat suitability indices), RMSE measures how much predictions deviate from actual presence/absence.

Formula:
\[ RMSE = \sqrt{\frac{\sum (Predicted - Observed)^2}{n}} \]

✅ Lower RMSE = Better fit.
✅ Works well when comparing continuous predictions (e.g., suitability scores from MaxEnt).

13.3 Comparison of Model Evaluation Metrics

Metric	Best For	Weaknesses
Accuracy	Balanced presence/absence datasets.	Misleading for imbalanced data.
Sensitivity	Conservation-focused predictions.	Can overestimate species presence.
Specificity	Preventing false positives.	Ignores presence misclassification.
Kappa	Adjusting accuracy for chance.	Harder to interpret.
AUC-ROC	General model evaluation.	Can be biased for rare species.
AUC-PR	Rare species modeling.	Not commonly used in SDM.
TSS	Presence-background models.	Less known outside ecology.
RMSE	Continuous suitability models.	Doesn’t work for presence/absence.

13.3.1 Next Steps

Now that we understand model evaluation metrics, the next section will explore different validation techniques (cross-validation, train-test split, and spatial validation) to ensure our models generalize well to new data. 🚀

Model validation ensures that our Species Distribution Models (SDMs) perform well on unseen data. Without proper validation, models may overfit or fail to generalize to real-world conditions.

13.3.2 1. Train-Test Split

Why Use a Separate Test Set?
A model that fits training data well might fail on new data. The train-test split ensures that the model is evaluated on independent data to measure generalization.

✅ Prevents overfitting
✅ Ensures predictions are reliable
✅ Used in all machine learning applications

13.3.2.1 Code Example: Train-Test Split
# Load necessary package
library(caret)

# Split data: 70% Training, 30% Testing
set.seed(123)
trainIndex <- createDataPartition(data$presence, p = 0.7, list = FALSE)
train_data <- data[trainIndex, ]
test_data  <- data[-trainIndex, ]

13.3.3 2. Cross-Validation

Cross-validation is used when we don’t want to lose data for training. It divides the dataset into multiple subsets and trains the model several times to ensure stability.

13.3.3.1 K-Fold Cross-Validation

Splits data into K folds (e.g., 5-fold or 10-fold).
The model is trained on K-1 folds and tested on the remaining fold.
The process repeats K times, and results are averaged.

✅ More reliable than a single train-test split
✅ Works well with small datasets

13.3.3.2 Code Example: K-Fold Cross-Validation
# Perform 5-fold cross-validation
train_control <- trainControl(method = "cv", number = 5)
cv_model <- train(presence ~ ., data = train_data, method = "rf", trControl = train_control)
print(cv_model)

13.3.3.3 Leave-One-Out Cross-Validation (LOOCV)

Uses one data point as the test set and all others as training.
Repeats this process for every data point.

✅ Best for very small datasets
✅ Ensures all data points contribute to validation
⚠️ Computationally expensive for large datasets

13.3.3.4 Code Example: LOOCV
# Perform Leave-One-Out Cross-Validation
train_control_loocv <- trainControl(method = "LOOCV")
loocv_model <- train(presence ~ ., data = train_data, method = "rf", trControl = train_control_loocv)
print(loocv_model)

13.3.4 3. Spatially Explicit Validation

Standard validation methods assume that data points are independent, but species occurrence points are often spatially clustered. Spatially explicit validation ensures that evaluation accounts for spatial autocorrelation.

✅ Prevents overestimating model performance
✅ Ensures the model works in unsampled areas

13.3.4.1 Approach: Block Cross-Validation

Divides the study area into spatial blocks.
Uses some blocks for training and others for testing.

13.3.4.2 Code Example: Spatial Cross-Validation
library(blockCV)

# Define spatial blocks
blocks <- spatialBlock(speciesData = data, theRange = 100000, k = 5)

# Perform cross-validation with spatial blocks
train_control_spatial <- trainControl(method = "cv", index = blocks$folds)
spatial_model <- train(presence ~ ., data = train_data, method = "rf", trControl = train_control_spatial)
print(spatial_model)

13.4 4. Visualizing Model Performance

Model evaluation is easier to interpret when results are visualized. Below are key ways to visualize model quality.

13.4.1 1. ROC and Precision-Recall Curves

13.4.1.1 ROC Curve (Receiver Operating Characteristic Curve)

X-axis: False Positive Rate
Y-axis: True Positive Rate
Higher AUC (closer to 1) indicates better performance.

13.4.1.2 Code Example: Plot ROC Curve
library(pROC)

# Compute and plot ROC curve
roc_curve <- roc(test_data$presence, predict(model, test_data, type = "prob")[,2])
plot(roc_curve, main = "ROC Curve")

13.4.1.3 Precision-Recall Curve

Best used for imbalanced datasets where absences outnumber presences.

13.4.1.4 Code Example: Precision-Recall Curve
# Compute and plot Precision-Recall curve
pr_curve <- pr.curve(scores.class0 = predict(model, test_data, type = "prob")[,2],
                     weights.class0 = test_data$presence)
plot(pr_curve, main = "Precision-Recall Curve")

13.4.2 2. Suitability Maps and Threshold Selection

Binary vs. Continuous Predictions
- Continuous maps show habitat suitability scores.
- Binary maps classify suitable vs. unsuitable areas based on a threshold.

13.4.2.1 Thresholding Approaches

Maximize Sensitivity-Specificity
10th Percentile Presence Threshold (for rare species).

13.4.2.2 Code Example: Suitability Map with Thresholding
library(raster)

# Predict suitability
suitability_map <- predict(model, raster_stack, type = "response")

# Convert to binary presence/absence using threshold
threshold <- 0.5
binary_map <- suitability_map > threshold

# Plot results
par(mfrow = c(1,2))
plot(suitability_map, main = "Continuous Suitability Map")
plot(binary_map, main = "Binary Presence/Absence Map")

13.4.3 3. Response Curves

Response curves show how each environmental variable affects species predictions.

✅ Ensures ecological realism
✅ Helps identify misleading predictors

13.4.3.1 Code Example: Plot Response Curves

# Plot response curves for important variables
plot.gbm(model, i.var = "bio1", main = "Effect of Temperature (Bio1)")
plot.gbm(model, i.var = "bio12", main = "Effect of Precipitation (Bio12)")

✅ What to Look For?
- Do the response curves make ecological sense?
- Are important predictors showing meaningful trends?

13.4.4 Summary: Key Model Validation & Visualization Techniques

Method	Purpose	Best For
Train-Test Split	Evaluates performance on unseen data.	General model validation.
K-Fold Cross-Validation	Ensures stability across multiple splits.	Large datasets.
LOOCV	Uses every point for training/testing.	Small datasets.
Spatial Cross-Validation	Prevents spatial overfitting.	SDMs with clustered data.
ROC Curve	Evaluates sensitivity-specificity trade-off.	Presence/absence models.
Precision-Recall Curve	Measures performance for rare species.	Imbalanced datasets.
Suitability Maps	Visualizes habitat suitability.	Presence-only models (e.g., MaxEnt).
Response Curves	Ensures predictor-response relationships make sense.	All SDMs.

13.4.5 Next Steps

Now that we have validated the model and visualized results, the next section will explore comparing different SDM models (CART, RF, GBM, MaxEnt) to choose the best approach for a given dataset. 🚀

13.5 5. Comparing Models: CART, RF, GBM, and MaxEnt

Selecting the best Species Distribution Model (SDM) depends on data type, research goals, and computational resources. Here, we compare CART, Random Forest (RF), Gradient Boosting Machine (GBM), and MaxEnt to determine when each is most useful.

13.5.1 1. Best Evaluation Metrics for Tree-Based vs. Presence-Background Models

Different model types require different evaluation metrics:

Metric	CART & RF (Classification)	GBM (Boosting)	MaxEnt (Presence-Only)
Accuracy	✅ Good for presence/absence	✅ Works well	❌ Not applicable
AUC-ROC	✅ Good for all classification models	✅ Strong predictor	✅ Best for MaxEnt
Precision-Recall	✅ Works for imbalanced data	✅ Works for boosting models	✅ Works for presence-only data
TSS	✅ Useful in SDMs	✅ More informative than accuracy	✅ Common in presence-background models
Response Curves	✅ Helps interpret variable impact	✅ Useful for interpretation	✅ Essential for ecological validation

✅ Takeaway: MaxEnt relies more on AUC and presence-background validation, while tree-based methods use accuracy and classification metrics.

13.5.2 2. When to Use Different SDM Models?

Model	Best Used For	Advantages	Disadvantages
CART	Simple rules-based classification.	Easy to interpret.	Prone to overfitting.
Random Forest (RF)	Presence-absence classification.	Handles nonlinear relationships, reduces overfitting.	Computationally expensive.
GBM	High-accuracy modeling.	Best predictive power.	Requires tuning, slow training.
MaxEnt	Presence-only data modeling.	Handles small datasets, works without absence data.	Cannot model true absence data.

✅ Takeaway:
- Use CART when interpretability is key.
- Use RF for balanced presence-absence classification.
- Use GBM when maximum accuracy is needed.
- Use MaxEnt when only presence data is available.

13.5.3 3. Performance Trade-Offs: Interpretability vs. Accuracy vs. Computational Cost

Factor	CART	RF	GBM	MaxEnt
Interpretability	✅ Very High	❌ Harder to interpret	❌ Harder to interpret	✅ Response curves help interpret results
Accuracy	❌ Low	✅ High	✅ Very High	✅ High for presence-only
Computational Cost	✅ Fast	❌ Medium-High	❌ High	✅ Efficient for presence-only

✅ Takeaway:
- Use CART when interpretability is more important than accuracy.
- Use RF/GBM when accuracy is the priority.
- Use MaxEnt when working with presence-only data and computational efficiency is needed.

13.6 6. Common Pitfalls in Model Evaluation

Even when models perform well, several pitfalls can lead to misleading results.

13.6.1 1. Overfitting: When a Model is Too Complex and Fails to Generalize

⚠️ Problem: A model that fits training data perfectly but performs poorly on new data.

🔹 Solution:
- Use cross-validation to ensure generalization.
- Regularization in GBM and MaxEnt prevents overfitting.
- Limit tree depth in CART and RF.

13.6.2 2. Ignoring Spatial Bias: Why Presence Points Should Be Spatially Independent

⚠️ Problem:
- If presence records cluster near roads or specific locations, the model may falsely learn that species prefer those areas.

🔹 Solution:
- Use spatial cross-validation instead of random splits.
- Apply bias correction in MaxEnt to adjust for sampling effort.

13.6.3 3. Improper Thresholding: Setting Unrealistic Suitability Cutoffs

⚠️ Problem:
- Converting continuous suitability scores into binary presence-absence maps using arbitrary thresholds.

🔹 Solution:
- Use Maximize Sensitivity-Specificity Thresholding.
- Compare multiple thresholding methods to select the most meaningful.

✅ Takeaway: Always validate threshold selection with ecological knowledge.

13.7 7. Summary & Best Practices

13.7.1 Key Takeaways from Model Evaluation

Different models require different evaluation metrics – RF/GBM rely on classification metrics, while MaxEnt uses presence-background validation.
Choose models based on data availability – Presence-absence models work well for classification, while MaxEnt is best for presence-only data.
Avoid common pitfalls – Overfitting, spatial bias, and improper thresholding can reduce model reliability.

13.7.2 Recommended Workflow for Assessing SDM Performance

✅ Step 1: Choose an SDM Model Based on Data Type
- Use CART, RF, or GBM for presence-absence.
- Use MaxEnt for presence-only data.

✅ Step 2: Apply Proper Validation Techniques
- Use Train-Test Splitting or Cross-Validation for presence-absence models.
- Use Spatial Cross-Validation when presence points are clustered.

✅ Step 3: Select the Right Performance Metrics
- Accuracy, Sensitivity, Specificity, Kappa for RF/GBM.
- AUC, TSS, PR-Curve for MaxEnt.

✅ Step 4: Interpret Results Ecologically
- Ensure that response curves make biological sense.
- Check that species presence is predicted in ecologically relevant areas.

13.7.3 Iterative Improvements: How to Refine Models Based on Evaluation Results

🔄 If AUC is low → Try different feature selection methods.
🔄 If overfitting occurs → Reduce model complexity (e.g., increase regularization in GBM, limit tree depth in RF).
🔄 If presence points are biased → Use spatially explicit validation.

✅ Final Takeaway:
By following best practices in model evaluation, you ensure that species distribution models are robust, accurate, and ecologically meaningful. 🚀

13.8 # Ensemble Models

13.9 1. Introduction to Ensemble Models

13.9.1 What Are Ensemble Models?

Ensemble models combine predictions from multiple models to increase accuracy, stability, and generalization. Instead of relying on a single model, ensembles aggregate results from different models to produce a more robust prediction.

Ensemble methods are particularly useful in Species Distribution Modeling (SDM) because different algorithms may capture different aspects of species-environment relationships.

13.9.2 Why Use Ensemble Approaches in SDM?

📌 Single models may be biased – Different SDM algorithms have strengths and weaknesses.
📌 Combining models improves reliability – Ensembles reduce overfitting and increase prediction stability.
📌 More realistic ecological interpretations – Reducing uncertainty in predictions makes results more reliable for conservation planning.

✅ Example: If a species’ distribution is predicted differently by MaxEnt (presence-only) and Random Forest (presence-absence), an ensemble model can blend both predictions for a better outcome.

13.9.3 Advantages Over Single-Model Predictions

Advantage	Explanation
Increased Accuracy	Aggregating models reduces individual errors.
Better Generalization	Prevents overfitting to training data.
Reduced Uncertainty	Multiple models provide more stable predictions.
Improved Ecological Interpretability	More robust predictions across different environmental gradients.

✅ Takeaway: Ensemble models are widely used in machine learning and ecology to improve species distribution predictions.

13.10 2. Types of Ensemble Methods

There are four common ensemble modeling techniques used in SDM.

13.10.1 1. Bagging (Bootstrap Aggregating)

Bagging creates multiple versions of the same model by training them on different random subsets of the data and then averaging their predictions.

📌 Example: Random Forest is a bagging method that builds multiple decision trees and averages their outputs.

✅ Reduces variance and prevents overfitting.
✅ Works well for presence-absence models.

13.10.2 2. Boosting

Boosting builds models sequentially, where each new model learns from the errors of the previous models to improve performance.

📌 Example: Gradient Boosting Machine (GBM) sequentially refines weak learners into a strong model.

✅ Improves accuracy by learning from mistakes.
✅ Works well with complex, nonlinear relationships.

13.10.3 3. Stacking (Model Stacking)

Stacking is a meta-learning approach that combines predictions from multiple models and trains another model to learn which predictions to trust more.

📌 Example: Combining MaxEnt, Random Forest, and GBM into a final predictive model.

✅ Learns the best combination of models for optimal predictions.
✅ Reduces individual model biases.

13.10.4 4. Weighted Ensemble Models

Instead of treating all models equally, weighted ensembles assign different weights to different models based on performance metrics (e.g., AUC, TSS).

📌 Example: If MaxEnt performs best for presence-only data, it gets a higher weight than CART or GBM in the final ensemble.

✅ Balances strengths and weaknesses of different models.
✅ Can be customized based on model reliability.

13.10.5 Summary: Choosing the Right Ensemble Method

Ensemble Method	Best For	Key Benefit
Bagging (RF)	Presence-Absence Data	Reduces overfitting
Boosting (GBM)	Complex SDM Patterns	Increases accuracy
Stacking	Combining Multiple SDM Models	Improves predictive power
Weighted Ensembles	Balancing Model Contributions	Reduces bias

13.11 3. Implementing Ensemble Models in R

In this section, we will train multiple Species Distribution Models (SDMs) using CART, Random Forest (RF), Gradient Boosting Machine (GBM), and MaxEnt, then combine their predictions into an ensemble model for improved accuracy and robustness.

13.11.1 Step 1: Load Necessary Libraries

# Load required libraries
library(dismo)        # For species distribution modeling
library(rpart)        # CART (Decision Tree)
library(randomForest) # Random Forest
library(gbm)         # Gradient Boosting Machine
library(caret)       # Model training and evaluation
library(ENMeval)     # MaxEnt modeling
library(terra)       # Handling raster data

13.11.2 Step 2: Load and Prepare Data

We will use presence-absence data from dismo::bioclim and environmental predictors.

# Load species data
data <- dismo::bioclim

# Convert presence to a factor (classification task)
data$presence <- as.factor(data$presence)

# Split into Training and Testing Sets (70% Training, 30% Testing)
set.seed(123)
trainIndex <- createDataPartition(data$presence, p = 0.7, list = FALSE)
train_data <- data[trainIndex, ]
test_data  <- data[-trainIndex, ]

13.11.3 Step 3: Train Individual SDM Models

13.11.3.1 1. CART Model (Decision Tree)

cart_model <- rpart(presence ~ ., data = train_data, method = "class")
cart_pred <- predict(cart_model, test_data, type = "prob")[,2]  # Probability of presence

13.11.3.2 2. Random Forest Model

rf_model <- randomForest(presence ~ ., data = train_data, ntree = 500, importance = TRUE)
rf_pred <- predict(rf_model, test_data, type = "prob")[,2]  # Probability of presence

13.11.3.3 3. Gradient Boosting Machine (GBM)

gbm_model <- gbm(presence ~ ., data = train_data, distribution = "bernoulli", 
                 n.trees = 500, shrinkage = 0.01, interaction.depth = 3)
gbm_pred <- predict(gbm_model, test_data, n.trees = 500, type = "response")

13.11.3.4 4. MaxEnt Model (Presence-Only)

# Prepare data for MaxEnt
presence_points <- train_data[train_data$presence == 1, c("lon", "lat")]
background_points <- randomPoints(predictors, 500)  # Generate background points

# Train MaxEnt model
maxent_model <- maxent(predictors, presence_points)

# Predict probabilities
maxent_pred <- predict(maxent_model, test_data)

13.11.4 Step 4: Combine Predictions into an Ensemble Model

We will use a simple mean ensemble by averaging the probability predictions from all models.

# Combine model predictions
ensemble_pred <- (cart_pred + rf_pred + gbm_pred + maxent_pred) / 4

Alternatively, weighted ensembles can be used by giving higher weights to models with better AUC scores.

# Compute AUC scores for each model
library(pROC)
auc_cart <- auc(roc(test_data$presence, cart_pred))
auc_rf <- auc(roc(test_data$presence, rf_pred))
auc_gbm <- auc(roc(test_data$presence, gbm_pred))
auc_maxent <- auc(roc(test_data$presence, maxent_pred))

# Compute weighted ensemble based on AUC
total_auc <- auc_cart + auc_rf + auc_gbm + auc_maxent
weights <- c(auc_cart, auc_rf, auc_gbm, auc_maxent) / total_auc
ensemble_pred_weighted <- (weights[1] * cart_pred + 
                           weights[2] * rf_pred + 
                           weights[3] * gbm_pred + 
                           weights[4] * maxent_pred)

13.11.5 Step 5: Evaluate Ensemble Performance vs. Individual Models

13.11.5.1 1. Compute AUC Scores

auc_ensemble <- auc(roc(test_data$presence, ensemble_pred))
auc_weighted_ensemble <- auc(roc(test_data$presence, ensemble_pred_weighted))

print(paste("CART AUC:", auc_cart))
print(paste("Random Forest AUC:", auc_rf))
print(paste("GBM AUC:", auc_gbm))
print(paste("MaxEnt AUC:", auc_maxent))
print(paste("Simple Ensemble AUC:", auc_ensemble))
print(paste("Weighted Ensemble AUC:", auc_weighted_ensemble))

13.11.5.2 2. Compare Accuracy

# Convert probabilities to presence/absence using 0.5 threshold
ensemble_pred_class <- ifelse(ensemble_pred > 0.5, "1", "0")
weighted_ensemble_pred_class <- ifelse(ensemble_pred_weighted > 0.5, "1", "0")

# Compute accuracy for each model
accuracy_ensemble <- sum(ensemble_pred_class == test_data$presence) / nrow(test_data)
accuracy_weighted_ensemble <- sum(weighted_ensemble_pred_class == test_data$presence) / nrow(test_data)

print(paste("Ensemble Accuracy:", accuracy_ensemble))
print(paste("Weighted Ensemble Accuracy:", accuracy_weighted_ensemble))

✅ Expected Results:
- The ensemble models should outperform individual models.
- Weighted ensembles should perform better than simple averaging because they prioritize better-performing models.

13.11.6 Summary: Ensemble Modeling in SDM

Model	Strengths	Weaknesses
CART	Simple & interpretable	Lower accuracy
Random Forest (RF)	Reduces overfitting	Computationally expensive
GBM	High predictive accuracy	Requires tuning
MaxEnt	Works with presence-only data	Can’t model true absences
Ensemble (Mean)	Balances strengths of all models	May treat weaker models equally
Weighted Ensemble	Prioritizes stronger models	Requires AUC-based weighting

✅ Final Takeaway:
- Simple averaging of models is an easy way to improve accuracy.
- Weighted ensembles provide even better performance by giving higher importance to stronger models.

13.11.7 Benefits of Ensemble Models

Ensemble models provide significant improvements over single-model predictions, making them valuable for Species Distribution Modeling (SDM).

✅ Improved Accuracy
- By combining multiple models, ensembles capture different aspects of species-environment relationships, leading to better predictions.
- Example: If MaxEnt overpredicts species presence and RF underpredicts, the ensemble balances these biases.

✅ Reduced Overfitting
- Individual models (e.g., CART) may overfit to training data, but ensembles smooth out inconsistencies.
- Methods like bagging (RF) and boosting (GBM) prevent models from learning noise.

✅ Better Generalization to New Data
- Combining models makes predictions more stable across different environmental conditions.
- Example: If a single model performs poorly in certain regions, others in the ensemble compensate.

✅ More Robust Predictions for Conservation
- When SDMs are used for conservation planning, reducing uncertainty is crucial.
- Example: In habitat suitability mapping, ensembles minimize the risk of false negatives (missing critical habitats).

13.11.8 Limitations of Ensemble Models

⚠️ Increased Computational Cost
- Training multiple models requires more processing power and time.
- Example: Running CART, RF, GBM, and MaxEnt together takes longer than a single MaxEnt model.

⚠️ Model Complexity
- Understanding why an ensemble makes a prediction is harder than understanding a single decision tree (CART).
- Solution: Feature importance analysis can help interpret which variables matter most.

⚠️ Difficult Interpretation
- Some stakeholders (e.g., policymakers, conservation planners) prefer simple, interpretable models.
- Example: A single decision tree (CART) is easy to explain, while a complex GBM ensemble is not.

✅ Takeaway:
Use ensembles when accuracy is critical but be mindful of computational costs and interpretability challenges.

13.12 5. Summary & Best Practices

13.12.1 When to Use Ensemble Models in SDM

📌 When multiple models provide different results, and we need a consensus prediction.
📌 When we want to reduce uncertainty in species distribution maps.
📌 When working with complex environmental datasets where single models struggle.
📌 When prioritizing accuracy over interpretability (e.g., conservation planning).

13.12.2 Recommended Workflow for Building and Evaluating Ensembles

✅ Step 1: Train Multiple Models
- Use CART, RF, GBM, and MaxEnt to generate individual predictions.

✅ Step 2: Select the Best Models
- Remove models that perform poorly (low AUC, TSS, or Kappa scores).

✅ Step 3: Combine Predictions Using an Ensemble Approach
- Simple averaging (mean of all models).
- Weighted averaging (assign higher weight to better-performing models).

✅ Step 4: Evaluate Ensemble Performance
- Compare ensemble AUC and accuracy with individual models.
- Use response curves to check ecological validity.

✅ Step 5: Visualize and Interpret Results
- Generate ensemble suitability maps and compare with individual model maps.
- Ensure predictions align with ecological expectations.

13.12.3 How to Select the Best Models for Ensemble Predictions?

Scenario	Best Approach
High Interpretability Needed	Use CART + RF (decision trees are easier to explain).
Presence-Only Data	Use MaxEnt + GBM (handles presence-background data well).
Max Accuracy Required	Use Weighted Ensemble (GBM + RF).
Computationally Limited	Use Random Forest (RF) (avoids training multiple models).

✅ Final Takeaway
- Ensemble models provide higher accuracy, reduced overfitting, and better generalization.
- However, they require more computation and can be harder to interpret.
- Choose the right ensemble approach based on your dataset and conservation goals.

12 GBM Variable Selection