Code – Andrew Lang Wong

A collection of geospatial projects built with automated R and Python workflows

Analysis of fire burn severity and post-fire vegetation recovery with Landsat time-series

Objective: To examine the burn severity and post-fire vegetation recovery of the Prouton Lake Fire using common vegetative indices (NDVI and NBR) derived from Landsat timeseries data.

Language: R

Project Overview:

Loaded BC fire polygons and filtered to summarize 2017 fire season statistics and isolate the Prouton Lakes fire boundary
Looped through all Landsat scenes (2013–2021), cropping to the fire boundary, applying cloud masking, and calculating NDVI and NBR for each scene
Averaged scene-level indices into yearly composites for both NDVI and NBR
Calculated dNBR and classified into four burn severity zones, then vectorized into polygons
Masked annual NDVI composites by severity zone (2018–2021) and visualized recovery as maps and boxplots

Packages and Functions:

sf — read_sf, st_drop_geometry, st_transform
terra — rast, crop, mask, app, classify, as.polygons, extract
dplyr / tidyr —filter, slice_max, pivot_longer, bind_rows
ggplot2
lubridate / stringr

Code snippet

			
=====================================================
Code snippet from Prouton Lake Fire analysis
=====================================================
#generate a fire polygon for the Prouton fire
prouton_fire_polygon <- fire_polygons %>% 
  filter(FIRE_NO == "C30870") %>% 
  summarise()
#list.dirs for Landsat folders
folders <- list.dirs("data/Landsat 8 OLI_TIRS C2 L2", full.names = TRUE, recursive = FALSE)
#create an output directory
out_dir <- "data/output"
dir.create(out_dir, showWarnings = FALSE)
#create one large for loop to iterate through the folders,
#separate files, crop and mask to QA PIXEL, calculate ndvi and nbr, 
#then save outputs into three separate folders
for (f in folders) {
  prod_id <- basename(f)
  
  #Extract Date Using lubridate
  date_string <- stringr::str_sub(prod_id, 18, 25)
  acq_date <- lubridate::as_date(date_string, format = "%Y%m%d")
  # Create a clean filename base
  clean_name <- format(acq_date, "%Y%m%d")
  files <- list.files(f, full.names = TRUE, pattern = "TIF$", recursive = TRUE)
  # extract sr + qa files
  # Select only SR_B1 ... SR_B7
  sr_files <- files[grepl("SR_B[1-7]", files)]
 #build sr stack
  sr_stack <- terra::rast(sr_files)
  qa_files <- terra::rast(files[grepl("QA_PIXEL", files)])
  
  # Set CRS and crop rasters to fire polygon
  prouton_fire_polygon <- st_transform(prouton_fire_polygon, crs(sr_stack))
  
  # crop by polygon and then create the clear mask
  qa_crop <- terra::crop(qa_files, prouton_fire_polygon, mask = TRUE)
  clear_mask <- qa_crop == 21824
  
  #crop sr files by polygon, then mask using the clear mask
  sr_crop <- terra::crop(sr_stack, prouton_fire_polygon, mask = TRUE)
  sr_mask <- mask(sr_crop, clear_mask, maskvalues = 0)
  
  # calculate ndvi and nbr
  ndvi <- (sr_mask[[5]] - sr_mask[[4]]) / (sr_mask[[5]] + sr_mask[[4]])
  nbr  <- (sr_mask[[5]] - sr_mask[[7]]) / (sr_mask[[5]] + sr_mask[[7]])
  
  # create directories 
  sr_dir   <- file.path(out_dir, "SR")
  ndvi_dir <- file.path(out_dir, "NDVI")
  nbr_dir  <- file.path(out_dir, "NBR")
  dir.create(sr_dir, showWarnings = FALSE)
  dir.create(ndvi_dir, showWarnings = FALSE)
  dir.create(nbr_dir, showWarnings = FALSE)
  
  # write with date
  writeRaster(sr_mask, file.path(sr_dir, paste0(clean_name,"_SR.tif")),
              overwrite = TRUE)
  writeRaster(ndvi, file.path(ndvi_dir, paste0(clean_name,"_NDVI.tif")),
              overwrite = TRUE)
  writeRaster(nbr, file.path(nbr_dir, paste0(clean_name,"_NBR.tif")),
              overwrite = TRUE)

		

Supervised Image Classification of Land Cover on the Southern Gulf Islands

Objective: Perform a supervised land cover classification using Landsat 9 imagery, applying the Maximum Likelihood algorithm to distinguish seven land cover classes and assess classification accuracy.

Language: R

Project Overview:

Loaded a 6-band Landsat 9 surface reflectance image and delineated 100+ training polygons across seven land cover classes
Split polygons into 70/30 training and validation sets, extracted pixel-level spectral values, and visualized spectral signatures across all six bands per class
Ran a Maximum Likelihood classification using superClass() from RStoolbox, sampling 500 pixels per class to balance training data and reduce bias toward dominant classes like Water
Generated a confusion matrix from validation predictions and calculated Overall Accuracy, User Accuracy, and Producer Accuracy per class

Packages and Functions

RStoolbox — Maximum Likelihood supervised classification and accuracy assessment (superClass)
caret — confusion matrix generation and accuracy metric calculation

Geospatial Terrain Assessment of Cutblocks in Central BC

Objective: Characterize the topographic conditions of historic harvested cutblocks in central BC, and explore how terrain and hydrologic zones relate to harvesting patterns and geohazard risk

Language: Python

Project Overview:

Derived slope, aspect, elevation, and curvature rasters from a 30m DEM using arcpy.sa (Slope, Aspect, Curvature tools)
Calculated per-cutblock mean and standard deviation for each terrain metric using ZonalStatistics
Cleaned and renamed zonal statistics outputs, then merged all four terrain tables into a single master dataframe joined to the cutblocks geodataframe
Classified cutblocks into four slope categories (Gentle 0–15°, Moderate 15–30°, Steep 30–45°, Very Steep >45°) based on BC slope classification standards, and summarized total harvested area per class
Clipped BC Hydrologic Zones to the AOI, then spatially joined zones to cutblocks using to enable zone-level terrain comparisons
Visualized temporal harvesting trends and zone-level terrain distributions using matplotlib and seaborn, including slope over time, mean slope and curvature by hydrologic zone, and the top 10 steepest and largest cutblocks per zone

Packages and Functions

arcpy — Slope, Aspect, Curvature, ZonalStatistics, Clip
simpledbf — Dbf5
pandas / geopandas — sjoin, merge, groupby, cut
matplotlib.pyplot — subplots, figure, show
seaborn — barplot
os — chdir, path.join, path.splitext, path.basename

Code snippet


#STEP 1: Import libraries
import os
import arcpy
from arcpy.sa import ZonalStatisticsAsTable
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#Set up workspace
arcpy.CheckOutExtension("Spatial")
arcpy.env.workspace = os.chdir(r"data\scratch")
arcpy.env.overwriteOutput = True

#STEP 2: Calculate aspect, slope, curvature from dem

dem = r"dem.tif"

#calculate slope
slope_output = arcpy.sa.Slope(
        in_raster= dem,
        output_measurement="DEGREE")
slope_output.save(r"slope.tif")

#calculate aspect
aspect_output = arcpy.sa.Aspect(
    in_raster = dem)
aspect_output.save(r"aspect.tif")

#calculate curvature
curvature_output = arcpy.sa.Curvature(
        in_raster = dem,
        z_factor = 1)
curvature_output.save(r"curvature.tif")

#STEP 3: calculate zonal statistics by cutblock

zone_field = "BLOCK_ID"
output_folder = r"scratch\Outputs"

elev = r"dem.tif"
slope = r"slope.tif"
aspect = r"aspect.tif"
curv = r"curvature.tif"

#function for calculating mean and standard deviation using zonal statistics
def zonal_stats(raster, out_name):

    out_mean_std = fr"{output_folder}\{out_name}_mean_std.dbf"
    ZonalStatisticsAsTable(
        in_zone_data=cutblocks,
        zone_field=zone_field,
        in_value_raster=raster,
        out_table=out_mean_std,
        statistics_type="MEAN_STD",
        ignore_nodata="DATA"
    )

# Run for all four rasters
zonal_stats(elev,  "zstats_elevation")
zonal_stats(slope, "zstats_slope")
zonal_stats(aspect, "zstats_aspect")
zonal_stats(curv,  "zstats_curvature")

#STEP 4: clean up zonal statistics tables and rename fields for better merging
folder = output_folder

# list of (filename, layername) pairs
tables = [
    ("zstats_elevation_mean_std.dbf", "elev"),
    ("zstats_slope_mean_std.dbf",     "slope"),
    ("zstats_aspect_mean_std.dbf",    "aspect"),
    ("zstats_curvature_mean_std.dbf", "curv")]

cleaned = {}  # store cleaned dfs

for filename, layer in tables:
    path = os.path.join(folder, filename)

    # Load DBF into a pandas DataFrame
    dbf = Dbf5(path)
    df = dbf.to_dataframe()

    # Drop unnecessary columns
    df = df.drop(columns=["OID", "COUNT", "AREA"], errors="ignore")

    # Rename MEAN and STD to layer-specific names
    df = df.rename(columns={
        "MEAN": f"{layer}_mean",
        "STD":  f"{layer}_std",
        "HARVEST__!": "HARVEST_YEAR"})

    cleaned[layer] = df

# Merge all cleaned dfs on BLOCK_ID
master = cleaned["elev"] \
            .merge(cleaned["slope"], on="BLOCK_ID") \
            .merge(cleaned["aspect"], on="BLOCK_ID") \
            .merge(cleaned["curv"], on="BLOCK_ID")

#STEP 5: classify by slope angle and plot
cutblocks_gdf = gpd.read_file(cutblocks)

# Merge zonal stats master table
df = cutblocks_gdf.merge(master, on="BLOCK_ID", how="left")

#group by Harvest year 1
df.groupby("HARVEST_YEAR").agg({
    "elev_mean": ["mean", "std"],
    "slope_mean": ["mean", "std"],
    "aspect_mean": "mean",
    "curv_mean": "mean"})

#create slope class bins based on BC slope classification
bins = [0, 15, 30, 45, 90]
labels = ["Gentle (0–15°)", "Moderate (15–30°)", "Steep (30–45°)", "Very Steep (>45°)"]

#create a slope class dataframe using bins and labels described above
df["slope_class"] = pd.cut(df["slope_MEAN"], bins=bins, labels=labels, right=False)

#group by slope class and total area harvested
area_by_class = df.groupby("slope_class")["AREA_HA"].sum()

#plot graphs side by side
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

#plot slope class by total area harvested
area_by_class.plot(kind="bar", ax=axes[0])
axes[0].set_ylabel("Total area harvested (ha)")
axes[0].set_xlabel("Slope Class")
axes[0].set_title("Slope classes of harvested cutblocks by total area")

#plot average slope of harvested blocks over time
(df.groupby("HARVEST_YEAR")["slope_MEAN"].mean()
   .plot(kind="line", ax=axes[1]))

axes[1].set_ylabel("Average Slope (deg)")
axes[1].set_xlabel("Harvest Year")
axes[1].set_title("Average Slope of Harvested Blocks Over Time")
axes[1].set_xlim([1950, 2025])
plt.show()

Estimating Canopy Height from Spaceborne Lidar and Sentinel-2 Data Using Statistical Methods

Objective: To model wall-to-wall canopy height across Redwood National Park by integrating GEDI spaceborne LiDAR footprints with Sentinel-2 multispectral imagery, addressing GEDI’s non-continuous spatial coverage to support carbon inventory estimation

Language: R

Project Overview:

Downloaded and preprocessed GEDI LiDAR footprints and Sentinel-2 multispectral imagery for Redwood National Park, extracting band reflectance values for each GEDI shot location
Calculated ten spectral indices (NDVI, NDWI, NDMI, EVI, SAVI, NDRE, NBR, GSAVI, and others) from Sentinel-2 bands and assessed correlation with GEDI canopy height via a correlogram
Ran stepwise AIC selection to identify a six-predictor OLS model, then removed predictors after VIF testing flagged multicollinearity, yielding a stable three-predictor model
Tested OLS residuals for spatial autocorrelation which confirmed significant autocorrelation and violation of the independence assumption
Fitted and compared five spatial models Trend Surface, GLS, GWR, Spatial Lag, Spatial Error
Selected GLS as the final model based on its stability and interpretability
Predicted wall-to-wall canopy height across the park using the GLS model and Sentinel-2 imagery, producing a continuous CHM to support carbon inventory estimation

Packages and Functions:

corrplot — corrplot
MASS — stepAIC
car — vif
spdep — knn2nb, knearneigh, nb2listw, moran.test, moran.plot, sp.correlogram
gstat — variogram, fit.variogram, vgm
nlme — gls, corSpher, residuals
spgwr — gwr.sel, gwr
spatialreg — lagsarlm, errorsarlmt
stats — lm, rstandard, residuals, hist, shapiro.test, AIC, BIC, cor

Code snippet

			
# Read satellite imagery
s2_ras <- rast("data/redwood_aoi_sentinel_2_imagery.tif") 
# Read AOI dataset
aoi <- st_read("data/redwood_aoi.gpkg")
# Read our GEDI data
gedi_raw <- st_read("data/redwood_gedi.gpkg") %>% # Read GEDI
  rename(canopy_height = Canopy.Height..rh100.) # Simplify name
# Project down to CH & geometry
gedi_shots <- gedi_raw %>% 
  dplyr::select(canopy_height)
# Preprocessing -----------------------------------------------------------
# Extract band information from the Sentinel-2 imagery & convert to DF
df <- terra::extract(s2_ras, vect(gedi_shots), bind=T) %>%
  st_as_sf()
# Output some data
df %>% st_drop_geometry() %>% head()
# Add vegetative indices to the data frame
df_veg <- df %>%
  mutate(ndvi = (B8 - B4) / (B8 + B4),
         ndwi = (B3 - B8) / (B3 + B8),
         ndmi = (B8 - B11) / (B8 + B11),
         evi  = 2.5 * ((B8 - B4) / (B8 + 6 * B4 - 7.5 * B2 + 1)),
         savi = ((B8 - B4) / (B8 + B4 + 0.5)) * 1.5,
         gndi = (B8 - B3) / (B8 + B3),
         gsavi = ((B8 - B3) / (B8 + B3 + 0.5)) * 1.5,
         nir_green = B8 / B3,
         ndre = (B8 - B5) / (B8 + B5),
         nbr = (B8 - B12) / B8)
# View the data
df_veg %>% st_drop_geometry() %>% head()
# Aspatial Modelling ------------------------------------------------------
# Drop the geometry to investigate correlation between different bands/indices
df_nogeo <- st_drop_geometry(df_veg)
# Calculate & plot correlation
df_cor <- cor(df_nogeo)
corrplot(df_cor,
         method = "color",
         type = "lower",
         tl.cex = 0.6,
         tl.col = "black",
         addCoef.col = "black",
         number.cex = 0.4,
         diag = FALSE)
# Use stepwise AIC to create an MLR 
# Null model
model.null <- lm(canopy_height ~ 1,data = df_nogeo)
# Full model
model.full <- lm(canopy_height ~ .,data = df_nogeo)
# Forward selection w/ variables we're interested in 
model.step <- stepAIC(model.null, direction="forward",
                      scope = list(upper = ~ B3 + B4 + B5 + B7 + B9 + B11 +  
                                     gsavi,
                                   lower = ~1), trace = TRUE)
# Check outputs
summary(model.step)
# Check variable inflation & for co-linearity issues
vif(model.step)
#co-linearity issues found for B4
# Remove B4 and re-run stepwise AIC
model_final <- stepAIC(model.null, direction="forward",
                       scope = list(upper = ~ B3 + B5 +  B9 +  B11 +
                                      gsavi,
                                    lower = ~1), trace = TRUE)
# Check model
summary(model_final)
# Check variable inflation & for co-linearity issues
vif(model_final)
# Add residuals to check assumptions
df$resid <- rstandard(model_final)
# Check for normality
hist(df$resid)
shapiro.test(df$resid)
# Check for linearity and independence
par(mfrow = c(2,2))
plot(model_final)
# Potential independence issues -> 
# Plot the residuals to investigate for spatial autocorrelation
ggplot(data = df) +
  geom_sf(aes(size = resid, color = resid), alpha = 0.5) +
  scale_color_viridis_c(option = "magma", name = "Residual Value") +
  theme_minimal() +
  labs(title = "GEDI Canopy Height Bubbles",
       size = "Residual",
       color = "Residual")
# Investigate Spatial Autocorrelation -------------------------------------
# Use 12 closest neighbours to define neighbourhood structure
nb <- knn2nb(knearneigh(df, k = 12))
listw <- nb2listw(nb, style = "W")
# Test the residuals for spatial autocorrelation
moran.test(df$resid, listw)
# Test the canopy height for spatial autocorrelation
moran.test(df$canopy_height, listw)
# Create a correlogram to gauge spatial autocorrelation
par(mfrow=c(1,1))
plot(sp.correlogram(nb, 
                    var = df$canopy_height, 
                    order = 10,
                    method = "I", 
                    style = "W", 
                    randomisation = TRUE,
                    zero.policy = TRUE),
     main="Moran Correlogram for Canopy Height")
# Plot the Moran's I for Canopy Height
gedi_mp <- moran.plot(df$canopy_height, listw)  
title(main = "Moran Scatterplot for Canopy Height")
# Create a bubble plot to investigate spatial autocorrelation of canopy height
ggplot(data = df) +
  geom_sf(aes(size = canopy_height, color = canopy_height), alpha = 0.5) +
  scale_color_viridis_c() +
  theme_minimal() +
  labs(title = "GEDI Canopy Height Bubbles",
       size = "Height (m)",
       color = "Height (m)")
# Create a variogram for the GEDI canopy height
gedi_vgm <- variogram(canopy_height~1, df, cutoff=2,  width = 0.1)
# Estimate some initial values for our variogram for fitting
psill <- var(df$canopy_height) * 0.7
range <- max(gedi_vgm$dist)
nugget <- var(df$canopy_height) * 0.3
# Fit the variogram & plot
gedi_fit <- fit.variogram(gedi_vgm, model=vgm(psill, "Sph", range, nugget)) 
plot(gedi_vgm, gedi_fit)
# Spatial Modelling -------------------------------------------------------
# Add our spatial component as columns
df_spatial <- df_veg %>% mutate(X = st_coordinates(.)[,1],
                                Y = st_coordinates(.)[,2])
# Trend Surface Analysis
model_trend_surface <- lm(canopy_height ~ B3 + B5 + gsavi + X + Y,
                          data = df_spatial)
# GLS
model_gls <- gls(canopy_height ~ B3 + B5 + gsavi,
                 correlation = corSpher(form = ~ X + Y, nugget=TRUE),
                 data=df_spatial)