5  Weather Data: Metrics

Objectives of this page

This chapter uses the gridded weather data from Chapter 4 to construct various metrics. Please refer to the 27 core ETCCDI Climate Change Indices (https://etccdi.pacificclimate.org/indices.shtml) for additional metrics that can be constructed.

All these metrics will be computed at the grid cell level, on the finest temporal scale. Aggregation at various levels (monthly, quarterly) to match with macroeconomic data will be performed once the indicators are created. Spatial aggregation will not be performed in this chapter.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
library(zoo)

Attaching package: 'zoo'

The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
library(lubridate)

Let us load the graphs theme functions (See Chapter 1):

source("../scripts/functions/utils.R")

Let us also load the maps for the countries of interest (See Chapter 2)

load("../Data/Maps/GADM-4.1/maps_level_0.RData")
load("../data/Maps/GADM-4.1/maps_level_1.RData")

We focus on the following countries defined in Chapter 2:

load("../output/tb_countries.rda")

Lastly, we load the weather data (see Chapter 4):

load("../data/Weather/CPC/NZL_daily_weather_corrected.rda")

We rename the max and min temperature columns:

country_weather_daily <- 
  country_weather_daily |> 
  rename(
    temp_max = tmax,
    temp_min = tmin
  )

Let us add the year, month, month name and day of year to the rows of the dataset:

country_weather_daily <- 
  country_weather_daily |> 
    dplyr::mutate(
      year = lubridate::year(date),
      month = lubridate::month(date),
      month_name = lubridate::month(
        date, abbr = FALSE, label = TRUE, locale = "en_US"
      ),
      day_of_year = lubridate::yday(date)
    )

5.1 Potential Evapotranspiration

We have daily precipitation \(P_t\), minimum temperatures \(\text{Tmin}_{t}\) and maximum temperatures \(\text{Tmax}_{t}\) at the grid cell level. We are interested in building a soil moisture index which requires to compute the evapotranspiration in a prior step. We follow the approach presented in Dingman (2015) (pp. 299–300, Box 6.8: Thornthwaite-Type Monthly Water-Balance Model) and recalled in Appendix S1 of Lutz, Wagtendonk, and Franklin (2010).

5.1.1 Potential Evapotranspiration

The daily PET (in mm/day) writes (assumed to be from Hamon (1964), but the article cannot be found on the Internet ; Dingman (2015) Eq. 6.68 p. 294): \[ \text{PET}_t = \begin{cases} 0, & \text{if }\mathrm{Tmean}_t \le 0^\circ\mathrm{C},\\ 29.8 \times \text{DL}_t \times \dfrac{e^\star(\mathrm{Tmean}_t)}{\mathrm{Tmean}_t + 273.2}, & \mathrm{otherwise}, \end{cases} \tag{5.1}\]

where \(\text{DL}_t\) is the day length of day \(t\), expressed in hours and \(e^\star(\cdot)\) gives the saturation vapour pressure.

\(\text{Tmean}_t\) is the daily average temperature: \[ \mathrm{Tmean}_t = \frac{\mathrm{Tmin}_t + \mathrm{Tmax}_t}{2}. \tag{5.2}\]

The methodology to compute day length is given in Appendix D of Dingman (2015). It writes:

\[ \text{DL}_t = \frac{24}{\pi}\omega_s, \tag{5.3}\]

where \(\omega_s\) (in radians) is the sunrise hour angle. At sunrise and sunset, the solar zenith angle \(\theta_z\) equals \(90^\circ + h_0\), where \(h_0\) is the apparent altitude of the Sun’s center at sunrise. The sunrise hour angle \(\omega_s\) satisfies: \[ \cos(\omega_s) = \dfrac{\sin(h_0) - \sin(\Lambda) \sin(\delta)}{\cos(\Lambda) \cos(\delta)}, \tag{5.4}\] with \(\Lambda\) the latitude (in radians), \(\delta\) the solar declination (in radians) and \(h_0 \approx -0.833^\circ\).

The solar declination in radians writes (Campbell and Norman (1998)): \[ \sin^{-1}(0.39795 \times \cos(0.2163108 + 2 \tan^{-1}(0.9671396 \times \tan(0.0086 \times (\text{doy}_t - 186))) )) \tag{5.5}\]

The saturation vapour pressure \(e^\star(\cdot)\) (in k pascals) is given, for a temperature \(T\) (in °C), by (Dingman (2015), Box 2.2 p.99): \[ e^\star(T) = 0.611 \exp\left(\frac{17.27\times T}{T + 237.3}\right) \tag{5.6}\]

Let us compute the daily potential evapotranspiration in R. We define two functions:

  • decl_angle(), which computes the solar declination in radians for a given day-of-year (the code is from {TrenchR}),
  • daylength_hours(), which computes the day length (in hours) for a given day-of-year, at a given latitude.
The decl_angle() function.
#' @title Solar Declination in Radians (from {TrenchR})
#' 
#' @description The function calculates solar declination, which is the angular 
#'  distance of the sun north or south of the earth's equator, based on the day 
#'   of year (Campbell and Norman, 1998)
#' 
#' @param doy Day of year (1-366).
#' 
#' @returns The declination angle (in radians).
#'
#' @references
#' Campbell GS, Norman JM (1998). Introduction to environmental biophysics, 
#'  2nd ed. edition. Springer, New York. ISBN 0387949372.
decl_angle <- function (doy) {
  
  doy <- (doy - 1) %% 365 + 1
  
  rev_ang <- 0.2163108 + 2 * atan(0.9671396 * tan(0.0086 * (doy -186))) 
  asin(0.39795 * cos(rev_ang)) 
  
}
The daylength_hours() function.
#' Day length (in hours)
#' 
#' @param lat_deg Latitude (in degrees)
#' @param doy Day of year (1-366).
#' 
daylength_hours <- function(lat_deg, doy) {
  
  lambda <- lat_deg * pi / 180
  delta <- decl_angle(doy)
  h0 <- -0.833 * pi / 180 # apparent sunrise altitude
  
  # General sunrise equation with altitude:
  # cos(ws) = (sin(h0) - sin(lambda) sin(delta)) / (cos(lambda) cos(delta))
  num <- sin(h0) - sin(lambda) * sin(delta)
  den <- cos(lambda) * cos(delta)
  cos_ws <- num / den
  cos_ws <- pmin(pmax(cos_ws, -1), 1) # clamp to [-1,1]
  ws <- acos(cos_ws)
  
  (24/pi) * ws
}
country_weather_daily <- 
  country_weather_daily |> 
  mutate(
    temp_mean = (temp_min + temp_max) / 2,
    dl_hours  = daylength_hours(latitude, day_of_year)
  ) |> 
  arrange(cell_id, date) |> 
  # Daily PET (Hamon)
  mutate(
    esat = 0.611 * exp(17.27 * temp_mean / (temp_mean + 237.3)), # kPa
    PET_daily = 29.8 * dl_hours * (esat / (temp_mean + 273.2)), # mm/day
    PET_daily = if_else(temp_mean <= 0, 0, PET_daily) # Hamon convention
  )

5.2 Water Deficit (Water Balance)

The equation for soil water balance depends on whether water input (\(W_t\)) exceeds potential evapostranspiration (Lutz, Wagtendonk, and Franklin (2010)):

\[ \text{SW}_t = \begin{cases} \min \left\{S_\text{max}, (W_t - \text{PET}_t) + \text{SW}_{t-1}\right\},& \text{if } W_t \geq \text{PET}_{t},\\ \text{SW}_{t-1} - \Delta_{\text{soil},t}, & \text{otherwise}, \end{cases} \tag{5.7}\]

where \(\Delta_{\text{soil},t}\) is the fraction removed from storage: \[ \Delta_{\text{soil},t} = \text{SW}_{t-1} \times\left( 1 - \exp\left(-\dfrac{\text{PET}_t - W_t}{S_\text{max}}\right)\right), \tag{5.8}\]

\(S_{\text{max}}\) is the soil water-holding capacity in the top 200 cm of the soil profile. Ideally, this should be given from recorded values. We do not have this here, so we will use a value of 150mm:

The current NIWA water balance model uses a fixed soil moisture capacity of 150 mm of water, based on a typical loam soil. https://niwa.co.nz/sites/default/files/NZDI_more_info.pdf (New Zealand Drought Index and Drought Monitor Framework).

The actual evapotranspiration, \(\text{AET}_t\) writes: \[ \text{AET}_t = \begin{cases} \text{PET}_t, & \text{if } W_t \geq \text{PET}_{t},\\ W_t + \Delta_{\text{soil},t}, & \text{otherwise}. \end{cases} \]

The deficit writes: \[ D_t = \text{PET}_t - \text{AET}_t. \]

Daily snowmelt is often estimated using a degree-day approach, which assumes that melt is proportional to the number of degrees by which air temperature exceeds a threshold (typically 0°C). This formulation applies only to the existing snowpack, ensuring that snow deposited on a given day cannot melt immediately.

Following the standard degree-day formulation (Dingman (2015), Eq. 5.71), the daily snowmelt is written as a linear function of air temperature: \[ \text{Melt}_t = \min\left\{\text{DDF} \times \max(\mathrm{Tmean}_t - T_0,\, 0),\, \text{Pack}_{t-1}\right\} \tag{5.9}\]

where \(\text{DDF}\) is the degree-day factor (mm day\(^{-1}\)°C\(^{-1}\)), typically in the range 2–5 mm day\(^{-1}\)°C\(^{-1}\) depending on snow properties and surface conditions. The parameter \(T_0\) is the threshold temperature for melting (which is always set as 0°C). \(\mathrm{Tmean}_t\) is the daily mean air temperature (see Equation 5.13). \(\text{Pack}_{t-1}\) is the snow water equivalent remaining from the previous day.

The snowpack evolves according to the mass balance: \[ \text{Pack}_t = \text{Pack}_{t-1} + \text{Snow}_t - \text{Melt}_t. \]

The snow pack equation is recursive. We simply set the start value at 0 for the first date of the sequence of values within a cell.

\(\text{Snow}_t\) is the amount of snow, which is the amount of precipitation if the average daily temperature is lower or equal to \(T_0\), and 0 otherwise: \[ \text{Snow}_t = \begin{cases} P_t, & \text{if } T_{\text{mean}_t} \leq T_0 \\ 0, & \text{otherwise}. \end{cases} \tag{5.10}\]

Conversely, the amount of rain is defined as: \[ \text{Rain}_t = \begin{cases} P_t, & \text{if } T_{\text{mean}_t} > T_0 \\ 0, & \text{otherwise}. \end{cases} \tag{5.11}\]

The total water input to the soil becomes: \[ W_t = \text{Rain}_t + \text{Melt}_t. \]

Melt Factor and Rain/Snow Partition

The monthly precipitation can be divided into a rain fraction \(\text{Rain}_t\) and a snow fraction \(\text{Snow}_t\). To do so, we define the melt factor \(F_t\): \[ F_t = \begin{cases} 0, & \text{if }\text{Tmean}_t \leq 0^\circ,\\ 0.167 \times \text{Tmean}_t, & \text{if } 0^\circ <\text{Tmean}_t \leq 6^\circ,\\ 0, & \text{if }\text{Tmean}_t > 6^\circ. \end{cases} \tag{5.12}\]

where \(\text{Tmean}_t\) is the daily average temperature: \[ \mathrm{Tmean}_t = \frac{\mathrm{Tmin}_t + \mathrm{Tmax}_t}{2}. \tag{5.13}\]

The rain and snow fractions can then be computed as follows: \[ \begin{align} \text{Rain}_t & = F_t \times P_t\\ \text{Snow}_t & = (1 - F_t) \times P_t \end{align} \tag{5.14}\]

Recursive Snowpack and Melt

The melt factor \(F_t\) is also used to define snowmelt \(\text{Melt}_t\): \[ \text{Melt}_t = F_t \times (\text{Snow}_t + \text{Pack}_{t-1}), \tag{5.15}\]

where snow pack for a given day is given by: \[ \text{Pack}_t = (1 - F_t)^2 \times P_t + (1 - F_t) \times \text{Pack}_{t-1} \tag{5.16}\]

The snow pack equation is recursive. We simply set the start value at 0 for the first date of the sequence of values within a cell.

The monthly water input to the soil is obtained as: \[ W_t = \text{Rain}_t + \text{Melt}_t \tag{5.17}\]

Let us first compute water input \(W_t\).

country_weather_daily <- 
  country_weather_daily |> 
  mutate(
    T0 = 0,# freezing temperature
    rain  = if_else(temp_mean > T0, precip, 0),
    snow  = if_else(temp_mean <= T0, precip, 0),
    # Degree-day melt from existing pack only
    DDF = 3, # mm/day/°C
    pot_melt = pmax(temp_mean - T0, 0) * DDF
  ) |> 
  arrange(cell_id, date) |> 
  group_by(cell_id) |>
  # Daily snowpack recursion and melt
  mutate(
    pack = {
      n <- n()
      out <- numeric(n); prev <- 0
      for (i in seq_len(n)) {
        melt_i <- min(pot_melt[i], prev)# melt from previous pack only
        out[i] <- prev + snow[i] - melt_i
        prev <- out[i]
      }
      out
    },
    melt = pmin(lag(pack, default = 0), pot_melt),
    water_input = rain + melt
  ) |> 
  ungroup() |>
  select(-DDF, -T0, -pot_melt)
Code if monthly data and not daily
# This code is not evaluated here since we use daily data
country_weather_daily <- 
  country_weather_daily |> 
  mutate(
    # Monthly melt factor and rain/snow split
    melt_factor = case_when(
      temp_mean <= 0 ~ 0,
      temp_mean >= 6 ~ 1,
      TRUE           ~ 0.167 * temp_mean
    ),
    rain = melt_factor * precip,
    snow = (1 - melt_factor) * precip
  ) |> 
  arrange(cell_id, date) |> 
  group_by(cell_id) |>
  # Monthly snowpack recursion and melt
  mutate(
    a = 1 - melt_factor,
    b = (a^2) * precip, # term added each day
    c = a, # multiplier on previous pack
    pack = {
      init <- 0 # initial PACK for this cell_id
      # pack_t = b_t + c_t * pack_{t-1}
      out <- purrr:::accumulate2(
        b, c,
        .init = init,
        .f = function(bi, ci, prev) bi + ci * prev
      )
      as.numeric(tail(out, -1)) # drop the initial value
    },
    pack_lag = lag(pack, default = 0),
    melt = melt_factor * (snow + pack_lag),
    water_input = rain + melt
  ) |>
  ungroup() |>
  select(-a, -b, -c)

Then, we can compute soil water balance (\(\text{SW}_t\)), actual evapotranspiration (\(\text{AET}_t\)) and soil water deficit \(D_t\).

We define the update_soil_water_deficit() function that computes those values for a subset of observation corresponding to a cell, using the following input variables: \(W_t\), \(\text{PET}_t\), \(\text{SW}_{t-1}\) and \(S_{\text{max}}\).

The update_soil_water_deficit() function.
#' Compute Soil Water Deficit
#' 
#' @param W Water input to the system (in mm).
#' @param PET Potential evapotranspiration (in mm).
#' @param S_prev Soil water balance in previous period (in mm).
#' @param S_max Soil water-holding capacity in the top 200cm of the soil 
#'  profile (in mm).
#' 
#' @returns A list with the following elements:
#'  - `S_new`: soil water balance,
#'  - `AET`: evapotranspiration,
#'  - `surplus`: water surplus,
#'  - `deficit`: water deficit
#'  
update_soil_water_deficit <- function(W, PET, S_prev, S_max) {
  
  if (W >= PET) {
    # Water-abundant day: recharge first, overflow = surplus
    S_star <- (W - PET) + S_prev
    # New value for soil water balance
    S_new <- min(S_star, S_max)
    
    AET <- PET # Evapotranspiration
    surplus <- max(0, S_star - S_max)
    deficit <- 0
  } else {
    # Water-limited day: exponential draw from storage (Dingman/Lutz Eq. 13)
    D <- PET - W # unmet demand by inputs
    dSOIL <- S_prev * (1 - exp(-D / S_max))  # fraction removed from storage
    # New value for soil water balance
    S_new <- S_prev - dSOIL
    
    AET <- W + dSOIL # Evapotranspiration
    surplus <- 0
    deficit <- PET - AET
  }
  
  list(
    S_new = S_new, # Soil water balance
    AET = AET, # Evapotranspiration
    surplus = surplus, # Water surplus
    deficit = deficit # Water deficit
  )
}

We use that function on subsets of the dataset where each subset corresponds to a cell.

# Note: this chunk takes about 3 minutes to run.
# It is not evaluated here during compilation.
if (!file.exists("NZL_temprary_water_deficit.rda")) {
  # Ideally, we would need to use a value at the cell level.
  Smax_default <- 150
  
  country_weather_daily <- 
    country_weather_daily |>
    arrange(cell_id, date) |>
    group_by(cell_id) |>
    # Prepare new columns
    mutate(
      AET = NA_real_, # Evapostranspiration
      soil_moisture = NA_real_, # Water storage
      soil_surplus = NA_real_, # Water surplus
      soil_deficit = NA_real_ # Water deficit
    ) |>
    # For each cell, compute soil water deficit recursively
    group_modify(\(tb, key){
      n <- nrow(tb)
      S  <- 0.5 * Smax_default
      for (i in seq_len(n)) {
        u <- update_soil_water_deficit(
          W = tb$water_input[i],
          PET = tb$PET_daily[i],
          S_prev = S,
          S_max = Smax_default
        )
        tb$AET[i] <- u$AET
        tb$soil_moisture[i] <- u$S_new
        tb$soil_surplus[i] <- u$surplus
        tb$soil_deficit[i] <- u$deficit
        S <- u$S_new
      }
      tb
    }) |>
    ungroup()
  
  save(
    country_weather_daily, 
    file = "NZL_temprary_water_deficit.rda"
  )
} else {
  load("NZL_temprary_water_deficit.rda")
}

country_weather_daily
# A tibble: 2,975,097 × 25
   cell_id longitude latitude country_code date        precip temp_min temp_max
     <int>     <dbl>    <dbl> <chr>        <date>       <dbl>    <dbl>    <dbl>
 1       9      166.    -45.8 NZL          1980-01-01 19.5        5.38     18.1
 2       9      166.    -45.8 NZL          1980-01-02  6.58      10.7      12.3
 3       9      166.    -45.8 NZL          1980-01-03  2.47       9.22     13.6
 4       9      166.    -45.8 NZL          1980-01-04  0          6.35     18.1
 5       9      166.    -45.8 NZL          1980-01-05  0          6.42     19.2
 6       9      166.    -45.8 NZL          1980-01-06  0          8.80     19.1
 7       9      166.    -45.8 NZL          1980-01-07  0.0987    12.0      18.2
 8       9      166.    -45.8 NZL          1980-01-08  0.105     11.0      14.6
 9       9      166.    -45.8 NZL          1980-01-09  0          5.38     16.0
10       9      166.    -45.8 NZL          1980-01-10  0          7.00     17.8
# ℹ 2,975,087 more rows
# ℹ 17 more variables: year <dbl>, month <dbl>, month_name <ord>,
#   day_of_year <dbl>, temp_mean <dbl>, dl_hours <dbl>, esat <dbl>,
#   PET_daily <dbl>, rain <dbl>, snow <dbl>, pack <dbl>, melt <dbl>,
#   water_input <dbl>, AET <dbl>, soil_moisture <dbl>, soil_surplus <dbl>,
#   soil_deficit <dbl>

5.3 Soil Moisture Deficit Index (SMDI)

The Soil Moisture Deficit Index (SMDI, see Narasimhan and Srinivasan (2005)), turns daily soil water storage into a weekly drought/wetness index which takes values on \([-4,4]\) and which is comparable across locations and seasons. Negative values indicate dry conditions whereas positive values indicate wet conditions.

Since we have daily observation, we need to compute weekly values for soil moisture. We assign each day to one of 52 fixed 7-day blocks starting on January 1: \[ \text{week} = \min\left(\left\lfloor\frac{\text{yday}-1}{7}\right\rfloor + 1,\ 52\right). \] Note that we do not use the week() function from {lubridate} to avoid ISO weeks (which can have 53).

Then, for each grid cell (i), year (y), and week (w), compute the weekly mean available soil water: \[ \mathrm{SW}_{i,y,w} = \frac{1}{n_{i,y,w}} \sum_{t \in (i,y,w)} \text{SW}_t, \tag{5.18}\]

where \(\text{SW}_t\) is the soil water balance (in mm), previously computed (see Equation 5.7).

We define a function, find_wday() which assigns a fixed 7-day “week” indice (1 to 52) to calendar dates.

#' Assign fixed 7-day "week" indices (1–52) to calendar dates
#'
#' @description
#' Divides each year into 52 fixed 7-day blocks, starting on January 1
#' (block 1 = days 1–7, block 2 = days 8–14, ...
#' Any remaining day(s) beyond day 364 (e.g., Dec 31 in common years,
#' or Dec 30–31 in leap years) are assigned to week 52.
#'
#' @param x A `Date` vector.
#' @returns
#' An integer vector of the same length as `x`, giving week indices in `1:52`.
#' 
find_wday <- function(x) {
  pmin(((yday(x) - 1) %/% 7) + 1, 52)
}

Example:

find_wday(c(make_date(2020,12,31), make_date(2021,01,01)))
[1] 52  1
# First assign each day to one of the 52 weeks
country_weather_daily <- 
  country_weather_daily |>
  mutate(
    # week = lubridate::week(date) # includes 53...
    week = find_wday(date)
  )

# Then compute the average availabe soil water at the cell level
sw_weekly <- 
  country_weather_daily |> 
  group_by(cell_id, year, week) |>
  summarise(SW = mean(soil_moisture, na.rm = TRUE), .groups = "drop")

The long-term weekly statistics at the cell level then need to be computed:

  • \(\text{MSW}_{i,w}\): the median of \(\text{SW}_{i,y,w}\) over a long period (the entire sample, here),
  • \(\text{SW}_{\text{min},i,w}\): the min of \(\text{SW}_{i,y,w}\) over the same long period.
  • \(\text{SW}_{\text{max},i,w}\): the max of \(\text{SW}_{i,y,w}\) over the same long period.
# Long-term weekly statistics
sw_lt <- 
  sw_weekly |>
  group_by(cell_id, week) |>
  summarise(
    MSW = median(SW, na.rm = TRUE),
    SW_min = min(SW, na.rm = TRUE),
    SW_max = max(SW, na.rm = TRUE),
    .groups = "drop"
  )

The weekly soil-water anomaly are then computed, in percent. For each (cell, year, week), a piecewise, range-normalized anomaly is computed as follows: \[ \text{SD}_{i,y,w} = \begin{cases} 100 \dfrac{\text{SW}_{i,y,w} - \text{MSW}_{i,w}}{\text{MSW}_{i,w}-\text{SW}_{\min,i,w}}, & \text{if } \text{SW}_{i,y,w} \le \text{MSW}_{i,w},\\ 100 \dfrac{\text{SW}_{i,y,w} - \text{MSW}_{i,w}}{\text{SW}_{\max,i,w}-\mathrm{MSW}_{i,w}}, & \text{otherwise} \end{cases} \tag{5.19}\]

Weekly soil water anomalies \(\text{SD}_{i,y,w}\) will be negative when they are drier than the median for that week, positive when wetter, and naturally scaled by the local weekly climatological range.

# Soil water deficit (%)
sw_anom <- 
  sw_weekly |>
  left_join(sw_lt, by = c("cell_id", "week")) |>
  mutate(
    SD = if_else(
      SW <= MSW,
      100 * (SW - MSW) / pmax(MSW - SW_min, 1e-9),
      100 * (SW - MSW) / pmax(SW_max - MSW, 1e-9)
    )
  )

You may have notice that in the previous code, the denominator is ‘protected’ with a tiny value (\(10^{-9}\)) to avoid division by zero in flat climates.

The last step consists in computing the SMDI in a recursive manner: the SMDI carries persistence via a first-order recursion within each calendar year: \[ \begin{align*} \text{SMDI}_{i,y,1} & = \frac{\text{SD}_{i,y,1}}{50},\\ \text{SMDI}_{i,y,w} & = 0.5,\text{SMDI}_{i,y,w-1} + \frac{\text{SD}_{i,y,w}}{50}\quad (w\ge 2). \end{align*} \tag{5.20}\]

# SMDI computed recursively, by cell.
smdi_weekly <- 
  sw_anom |>
  arrange(cell_id, year, week) |>
  group_by(cell_id, year) |>
  group_modify(\(tb, key) {
    n <- nrow(tb)
    smdi <- numeric(n)
    for (i in seq_len(n)) {
      if (i == 1 || is.na(smdi[i-1])) {
        # Initial value
        smdi[i] <- tb$SD[i] / 50
      } else {
        smdi[i] <- 0.5 * smdi[i-1] + tb$SD[i] / 50
      }
    }
    tb$SMDI <- smdi
    tb
  }) |>
  ungroup()

Let us have a look at the values for a cell:

ggplot(
 data = smdi_weekly |> filter(cell_id == 19) |> mutate(x = year + week/52),
 mapping = aes(x = x, y = SMDI)
) +
  geom_line() +
  labs(x = NULL)
Figure 5.1: SMDI for a cell in New Zealand. Weekly values range from -4 to +4 indicating very dry to very wet conditions.

5.3.1 Monthly Aggregation

The SMDI values are computed on a weekly basis (52 fixed 7-day blocks per year). To align with standard reporting periods of macroeconomic data, we can aggregate these values to the monthly scale. Because some 7-day weeks span two months, we need to ensure that each month receives only the appropriate share of each week.

We proceed as follows:

  1. We build a monthly calendar of overlaps. For every combination of year and month present in the data, we compute how many days of each fixed 7-day week fall within that month. This gives us a set of weights (\(w^{(m)}_{w}\)) that indicate the fraction of the month covered by each week. The weights for a given month always sum to 1.

  2. We compute weighted monthly SMDI values. We join the weights from the previous step with the weekly SMDI observations and compute, for each grid cell, year, and month, a weighted mean of weekly SMDI values: \[ \mathrm{SMDI}_{m} = \frac{\sum_{w} \mathrm{SMDI}_{w} \times w^{(m)}_{w}}{\sum_{w} w^{(m)}_{w}}, \] where (\(w^{(m)}_{w}\)) is the proportion of the month accounted for by week (w).

We define a function, get_month_week_cal(), to get a monthly calendar of overlaps. Note that it relies on the find_wday() function previously defined.

#' Calendar of fixed 7-day "weeks" (1--52) overlapped with a given month
#'
#' @description
#' Builds the 52 fixed 7-day blocks for a given year
#' (block 1 starts on Jan 1, block k starts on Jan 1 + 7*(k-1) days),
#' computes each block's overlap (in days) with the specified month,
#' and returns per-block weights equal to overlap / days-in-month.' 
#' 
#' @param year Year (numeric).
#' @param month Month (numeric).
#' 
#' @returns
#' A tibble with one row per overlapping block and columns:
#' - `year`: the requested year,
#' - `month`: the requested month,
#' - `week`: fixed 7-day block index in 1...52,
#' - `weight_ndays`: overlap days over days in the month.
#' 
get_month_week_cal <- function(year, month) {
  
  m_start <- lubridate::make_date(year, month, 1)
  m_end <- (m_start %m+% months(1)) - lubridate::days(1)
  # days in the month
  dates <- seq(lubridate::ymd(m_start), lubridate::ymd(m_end), by = "day")
  weeks <- sapply(dates, find_wday)
  ndays_weeks <- tapply(weeks, weeks, length)
  
  dplyr::tibble(
    week = as.integer(names(ndays_weeks)),
    nb_days = as.integer(ndays_weeks),
    weight_ndays = nb_days / sum(nb_days)
  ) |> 
    dplyr::mutate(
      year = year,
      month = month,
      .before = 1L
    )
}

For example:

get_month_week_cal(2020, 12)
# A tibble: 5 × 5
   year month  week nb_days weight_ndays
  <dbl> <dbl> <int>   <int>        <dbl>
1  2020    12    48       1       0.0323
2  2020    12    49       7       0.226 
3  2020    12    50       7       0.226 
4  2020    12    51       7       0.226 
5  2020    12    52       9       0.290 

The years in the data:

years <- sort(unique(smdi_weekly$year))

We get the calendar of months for monthly aggregations by applying the get_month_week_cal() function to all the (year, month) covering the sample period:

months_cal <- tidyr::crossing(year = years, month = 1:12) |>
  purrr::pmap(\(year, month) get_month_week_cal(year, month)) |> 
  list_rbind()
months_cal
# A tibble: 2,778 × 5
    year month  week nb_days weight_ndays
   <dbl> <int> <int>   <int>        <dbl>
 1  1980     1     1       7       0.226 
 2  1980     1     2       7       0.226 
 3  1980     1     3       7       0.226 
 4  1980     1     4       7       0.226 
 5  1980     1     5       3       0.0968
 6  1980     2     5       4       0.138 
 7  1980     2     6       7       0.241 
 8  1980     2     7       7       0.241 
 9  1980     2     8       7       0.241 
10  1980     2     9       4       0.138 
# ℹ 2,768 more rows

We then proceed to the second step, where we compute the monthly aggregated values for SMDI:

smdi_monthly <- 
  smdi_weekly |> 
  left_join(
    months_cal, by = c("year", "week"),
    relationship = "many-to-many"
  ) |> 
  group_by(cell_id, year, month) |>
  summarise(
    # If any missing values in SMDI will lead to NAs
    SMDI = weighted.mean(SMDI, w = weight_ndays),
    .groups = "drop"
  )

Let us have a look at the values for a cell:

ggplot(
  data = smdi_monthly |> mutate(date = year + month / 12) |> 
    filter(cell_id == 19),
  mapping = aes(x = date, y = SMDI)
  ) +
  geom_line() +
  labs(x = NULL, y = "SMDI")
Figure 5.2: Monthly SMDI values for a cell in New Zealand. Weekly values range from -4 to +4 indicating very dry to very wet conditions.

5.3.2 Quarterly Aggregation

We may want to aggregate the SMDI values at the quarterly level (Q1–Q4). We follow the same logic as for monthly aggregation, adapting it to quarters:

  1. We first build a quarterly calendar of overlaps. For each year and quarter, we determine how many days of each 7-day week fall within the quarter. This yields a set of weights (\(w^{(q)}_w\)) expressing the fraction of the quarter represented by each week. The weights for a given quarter always sum to 1.

  2. We then compute weighted quarterly SMDI values. Using these weights, we aggregate the weekly SMDI values within each quarter by taking a weighted average, where each week’s contribution is proportional to the number of days it contributes to that quarter: \[ \mathrm{SMDI}_{q} = \frac{\sum_{w} \mathrm{SMDI}_{w} \times w^{(q)}_{w}}{\sum_{w} w^{(q)}_{w}}, \] where (\(w^{(q)}_{w}\)) denotes the proportion of the quarter accounted for by week (w).

The get_quarter_week_cal() function creates the quarterly calendar of overlaps. Again, note that it relies on the find_wday() function previously defined.

#' Calendar weights for fixed 7-day "weeks" (1–52) within a given quarter
#'
#' @description
#' For a given `year` and `quarter`, this function computes the number of days
#' from each fixed 7-day block (weeks 1–52, defined from January 1 in 7-day
#' increments) that fall within that quarter, along with their normalized
#' weights. Any remaining days beyond day 364 (e.g., Dec 31 in common years,
#' or Dec 30–31 in leap years) are assigned to week 52.
#' 
#' @param year Year (numeric).
#' @param quarter Quarter (numeric).
#' 
#' @returns
#' A tibble with one row per overlapping block and columns:
#' - `year`: the requested year,
#' - `quarter`: the requested quarter,
#' - `week`: fixed 7-day block index in 1...52,
#' - `weight_ndays`: overlap days over days in the quarter.
#' 
get_quarter_week_cal <- function(year, quarter) {
  q_start <- lubridate::make_date(year, (quarter - 1) * 3 + 1, 1)
  q_end <- (q_start %m+% months(3)) - lubridate::days(1)
  dates <- seq(q_start, q_end, by = "day")
  weeks <- find_wday(dates)
  ndays_quarter <- tapply(weeks, weeks, length)
  
  dplyr::tibble(
    week = as.integer(names(ndays_quarter)),
    nb_days = as.integer(ndays_quarter),
    weight_ndays = nb_days / sum(nb_days)
  ) |> 
    dplyr::mutate(
      year = year,
      quarter = quarter,
      .before = 1L
    )
}

The calendar of months for monthly aggregations:

quarters_cal <- tidyr::crossing(year = years, quarter = 1:4) |>
  purrr::pmap(\(year, quarter) get_quarter_week_cal(year, quarter)) |> 
  list_rbind()
quarters_cal
# A tibble: 2,418 × 5
    year quarter  week nb_days weight_ndays
   <dbl>   <int> <int>   <int>        <dbl>
 1  1980       1     1       7       0.0769
 2  1980       1     2       7       0.0769
 3  1980       1     3       7       0.0769
 4  1980       1     4       7       0.0769
 5  1980       1     5       7       0.0769
 6  1980       1     6       7       0.0769
 7  1980       1     7       7       0.0769
 8  1980       1     8       7       0.0769
 9  1980       1     9       7       0.0769
10  1980       1    10       7       0.0769
# ℹ 2,408 more rows

We then proceed to the second step, where we compute the quarterly aggregated values for SMDI:

smdi_quarterly <- 
  smdi_weekly |> 
  left_join(
    quarters_cal, by = c("year", "week"),
    relationship = "many-to-many"
  ) |> 
  group_by(cell_id, year, quarter) |>
  summarise(
    # If any missing values in SMDI will lead to NAs
    SMDI = weighted.mean(SMDI, w = weight_ndays),
    .groups = "drop"
  )

Let us have a look at the values for a cell:

ggplot(
  data = smdi_quarterly |> mutate(date = year + quarter / 4) |> 
    filter(cell_id == 19),
  mapping = aes(x = date, y = SMDI)
  ) +
  geom_line() +
  labs(x = NULL, y = "SMDI")
Figure 5.3: Quarterly SMDI values for a cell in New Zealand. Quarterly values range from -4 to +4 indicating very dry to very wet conditions.

5.4 Standardized Precipitation-Evapotranspiration Index (SPEI)

We will compute the Standardized Precipitation-Evapotranspiration Index (SPEI), a versatile drought index that uses climatic data to assess the onset, duration, and severity of drought conditions compared to normal conditions Vicente-Serrano, Beguería, and López-Moreno (2010). The SPEI index relies on the precipitation levels and potential evapotranspiration (estimated in Section 5.1).

The SPEI requires:

  • Monthly precipitation \(P_m\) and monthly potential evapotranspiration \(\text{PET}_m\) as inputs,
  • Water balance: \(D_m = P_m - \text{PET}_m\),
  • A scale \(k\), usually in \(\{1,2,3,6,12,24\}\), which defines the width (in months) for rolling accumulations. This scale thus controls for the magnitude of the memory,
  • A calibration window (we will use 1981–2010).
Table 5.1: SPEI values and corresponding climate conditions.
SPEI Climate condition
SPEI \(\geq\) 2.0 Extremely wet
1.5 \(\leq\) SPEI < 2.0 Severely wet
1.0 \(\leq\) SPEI < 1.5 Moderately wet
0.5 < SPEI < 1.0 Mildly wet
−0.5 \(\leq\) SPEI \(\leq\) 0.5 Normal
−1.0 < SPEI < −0.5 Mildly dry
−1.5 < SPEI \(\leq\) −1.0 Moderately dry
−2.0 < SPEI \(\leq\) −1.5 Severely dry
SPEI \(\leq\) −2.0 Extremely dry

First, we need to compute monthly aggregation of required weather variables, at the grid cell level.

weather_monthly <- 
  country_weather_daily |> 
  mutate(ym = floor_date(date, "month")) |> 
  group_by(cell_id, ym, year, month) |> 
  summarise(
    P = sum(precip, na.rm = TRUE), # in mm/month
    PET = sum(PET_daily, na.rm = TRUE), # in mm/month
    .groups = "drop"
  ) |> 
  arrange(cell_id, year, month) |> 
  mutate(balance = P - PET)
  
weather_monthly
# A tibble: 97,740 × 7
   cell_id ym          year month     P   PET balance
     <int> <date>     <dbl> <dbl> <dbl> <dbl>   <dbl>
 1       9 1980-01-01  1980     1 213.   77.9   135. 
 2       9 1980-02-01  1980     2 160.   66.3    93.9
 3       9 1980-03-01  1980     3 152.   56.7    95.8
 4       9 1980-04-01  1980     4  69.0  41.5    27.6
 5       9 1980-05-01  1980     5 202.   35.0   168. 
 6       9 1980-06-01  1980     6 165.   25.4   139. 
 7       9 1980-07-01  1980     7 166.   26.5   139. 
 8       9 1980-08-01  1980     8 182.   33.3   149. 
 9       9 1980-09-01  1980     9 183.   41.1   142. 
10       9 1980-10-01  1980    10 129.   55.0    73.5
# ℹ 97,730 more rows
#' Compute Standardized Precipitation-Evapotranspiration Index (SPEI)
#' 
#' @description
#' Computes the Standardized Precipitation–Evapotranspiration Index (SPEI)
#' at multiple temporal scales from a monthly climatic water balance time series.
#'
#' @param df A data frame containing at least the following columns:
#' - `ym`: a `Date` or `yearmon`-like variable indicating the month,
#' - `year`: numeric year of each observation,
#' - `balance`: the monthly climatic water balance, typically  
#'   P - PET (precipitation minus potential evapotranspiration), in millimetres (mm).
#' @param scales A numeric vector giving the accumulation periods (in months) 
#' for which the SPEI is computed. Default to `c(1, 3, 6, 12)`.
#' @param ref_start Optional numeric vector of length 2 giving the start year 
#' and month (e.g., `c(1981, 1)`) of the reference calibration period.  
#' If `NULL`, the full series is used.
#' @param ref_end Optional numeric vector of length 2 giving the end year 
#' and month (e.g., `c(2010, 12)`) of the reference calibration period.  
#' If `NULL`, the full series is used.
#'
#' @details
#' For each accumulation period `k` in `scales`, the function:
#' 1. Builds a monthly time series (`ts`) of the climatic water balance with 
#'    frequency 12.  
#' 2. Fits the [SPEI::spei()] model for the given scale `k`.  
#' 3. Extracts the standardized fitted values.  
#'
#' The output includes one column per computed scale, named `SPEI_k`,  
#' where `k` is the accumulation period (e.g., `SPEI_3` for 3-month SPEI).  
#'
#' @returns
#' A tibble with one row per input observation and the following columns:
#' - `ym`: corresponding month.  
#' - `SPEI_k`: standardized SPEI values for each scale `k`.  
#' 
compute_spei <- function(df, 
                         scales = c(1, 3, 6, 12),
                         ref_start = NULL, 
                         ref_end = NULL) {
  
  # Build a ts object for water balance with frequency = 12
  start_year  <- min(df$year, na.rm = TRUE)
  start_month <- month(min(df$ym, na.rm = TRUE))
  bal_ts <- ts(df$balance, start = c(start_year, start_month), frequency = 12)
  
  # Run SPEI for each scale
  spei_list <- lapply(scales, function(k) {
    if (is.null(ref_start) || is.null(ref_end)) {
      fit <- SPEI::spei(bal_ts, scale = k, verbose = FALSE)
    } else {
      fit <- SPEI::spei(
        bal_ts, scale = k, 
        ref.start = ref_start, ref.end = ref_end,
        verbose = FALSE
      )
    }
    # Extract the fitted standardized values as a numeric vector
    spei_values <- as.numeric(fit$fitted)
    tibble(
      !!paste("SPEI", k, sep = "_") := spei_values
    )
  })
  bind_cols(spei_list) |> 
    mutate(ym = df$ym, .before = 1L)
}

The reference period will be Jan. 1981 to Dec. 200.

ref_start <- c(1981, 1)
ref_end <- c(2010, 12)

For example, the SPEI for cell 19 gives:

compute_spei(
  df = weather_monthly |> filter(cell_id == 19),
  ref_start = ref_start,
  ref_end = ref_end
)
# A tibble: 540 × 5
   ym          SPEI_1  SPEI_3  SPEI_6 SPEI_12
   <date>       <dbl>   <dbl>   <dbl>   <dbl>
 1 1980-01-01  0.457  NA      NA           NA
 2 1980-02-01 -0.0726 NA      NA           NA
 3 1980-03-01 -0.0889  0.0265 NA           NA
 4 1980-04-01 -1.74   -0.917  NA           NA
 5 1980-05-01  0.271  -0.820  NA           NA
 6 1980-06-01 -0.145  -0.836  -0.582       NA
 7 1980-07-01  0.351   0.142  -0.567       NA
 8 1980-08-01  0.791   0.333  -0.351       NA
 9 1980-09-01  0.310   0.579  -0.227       NA
10 1980-10-01 -0.810   0.117   0.0809      NA
# ℹ 530 more rows

Check whether there are infinite values in the resulting values. If that is the case, consider a wider reference period.

The compute_spei() function is then applied to each cell.

spei_by_cell <- 
  weather_monthly |> 
  group_by(cell_id) |> 
  group_modify(
    ~ compute_spei(
      .x, scales = c(1, 3, 6, 12),
      ref_start = ref_start,
      ref_end = ref_end
    )
  ) |> 
  ungroup()

To finish, we recreate the year and month columns and remove the ym column:

spei_by_cell <- 
  spei_by_cell |> 
  mutate(
    year = year(ym),
    month = month(ym),
    .before = "ym"
  ) |> 
  select(-ym)

There are a few infinite values. If we widen the reference period, the number of inifnite values will decrease.

spei_by_cell |> 
  filter(is.infinite(SPEI_1))
# A tibble: 14 × 7
   cell_id  year month SPEI_1  SPEI_3  SPEI_6 SPEI_12
     <int> <dbl> <dbl>  <dbl>   <dbl>   <dbl>   <dbl>
 1      27  2018     7   -Inf -3.26   -2.66   -1.26  
 2      90  2011     8   -Inf -2.39   -0.894  -0.536 
 3      91  2011     8   -Inf -2.86   -0.534   0.104 
 4      97  2018    11    Inf  1.25   -0.0804  0.198 
 5      99  2011     8   -Inf -2.30   -0.613  -0.276 
 6     100  2011     8   -Inf -3.19   -0.484   0.0936
 7     101  2011     8   -Inf -2.14   -0.227  -0.0732
 8     107  1993     8   -Inf -0.490   0.326  -0.358 
 9     107  2011     8   -Inf -3.28   -0.501  -0.245 
10     108  2011     8   -Inf -2.00   -0.290  -0.393 
11     116  2011     8   -Inf -1.86   -0.371  -0.613 
12     191  2021     4   -Inf -1.14   -1.55   -1.61  
13     201  2016     5    Inf  1.25   -0.0123  0.583 
14     237  1984    10   -Inf  0.0822  0.0930  0.275 
spei_by_cell |> 
  filter(is.infinite(SPEI_3))
# A tibble: 30 × 7
   cell_id  year month SPEI_1 SPEI_3 SPEI_6 SPEI_12
     <int> <dbl> <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
 1      32  2022     7 -1.68    -Inf -3.87   -1.37 
 2      54  2024     7 -1.04    -Inf -0.785   0.145
 3      64  2024     7 -1.16    -Inf -1.13   -0.255
 4      73  2024     7 -1.23    -Inf -1.73   -1.30 
 5      74  2017    12 -1.71    -Inf -1.49   -0.807
 6      74  2024     7 -1.02    -Inf -1.36   -0.764
 7     100  2011    10 -0.539   -Inf -1.25   -0.515
 8     101  2011    10 -0.826   -Inf -1.25   -0.666
 9     107  2011    10 -0.500   -Inf -1.46   -0.694
10     108  1997    10 -1.28    -Inf -3.00   -1.66 
# ℹ 20 more rows
spei_by_cell |> 
  filter(is.infinite(SPEI_6))
# A tibble: 34 × 7
   cell_id  year month SPEI_1 SPEI_3 SPEI_6 SPEI_12
     <int> <dbl> <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
 1      27  2018     8 -2.68   -6.03   -Inf  -2.12 
 2      27  2019     1 -3.29   -4.12   -Inf  -4.01 
 3      32  2022     6 -1.36   -3.82   -Inf  -0.802
 4      32  2022     8 -0.613  -2.23   -Inf  -1.73 
 5      32  2022     9 -1.81   -2.11   -Inf  -2.42 
 6      32  2022    10 -2.30   -2.34   -Inf  -2.94 
 7     164  2023     4  2.04    4.23    Inf Inf    
 8     164  2023     6  1.56  Inf       Inf Inf    
 9     165  2023     4  1.65    3.10    Inf   3.25 
10     180  2023     4  2.18    2.68    Inf Inf    
# ℹ 24 more rows
spei_by_cell |> 
  filter(is.infinite(SPEI_12))
# A tibble: 117 × 7
   cell_id  year month SPEI_1 SPEI_3 SPEI_6 SPEI_12
     <int> <dbl> <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
 1      32  2022    12 -2.49   -3.32  -3.30    -Inf
 2      32  2023     4 -0.848  -1.29  -3.03    -Inf
 3     164  2023     4  2.04    4.23 Inf        Inf
 4     164  2023     5  2.61    2.22   5.07     Inf
 5     164  2023     6  1.56  Inf    Inf        Inf
 6     164  2023     7 -0.441   2.40   3.75     Inf
 7     179  2023     1  3.02    3.35   5.10     Inf
 8     179  2023     2  2.16    2.68   3.71     Inf
 9     179  2023     3 -0.136   2.65   3.88     Inf
10     179  2023     4  1.21    1.85   4.81     Inf
# ℹ 107 more rows

Here, we replace the few infinite values by NAs:

spei_by_cell <- 
  spei_by_cell |> 
  mutate(
    across(starts_with("SPEI_"), ~if_else(is.infinite(.x), NA_real_, .x))
  )

Let us have a look at the values for a cell:

Codes to create the Figure.
spei_bands <- tibble(
  ymin = c(-Inf, -2.0, -1.5, -1.0, -0.5,  0.5,  1.0,  1.5,  2.0),
  ymax = c(-2.0, -1.5, -1.0, -0.5,  0.5,  1.0,  1.5,  2.0,  Inf),
  label = c(
    "Extremely dry", "Severely dry", "Moderately dry", "Mildly dry",
    "Normal",
    "Mildly wet", "Moderately wet", "Severely wet", "Extremely wet"
  ),
  fill = c(
    "#67001f", "#b2182b", "#d6604d", "#f4a582", # dry shades
    "#f7f7f7",
    "#92c5de", "#4393c3", "#2166ac", "#053061" # wet shades
  )
) |> 
  mutate(label = factor(label, levels = label))

ggplot(
  data = spei_by_cell |> 
    filter(cell_id == 19) |> 
    mutate(x = year + month/12) |> 
    pivot_longer(
      cols = starts_with("SPEI_"), values_to = "SPEI", names_to = "scale"
    ) |> 
    mutate(
      scale = str_remove(scale, "SPEI_"),
      scale = factor(
        scale, levels = c(1, 3, 6, 12),
        labels = paste0("k=", c(1, 3, 6, 12))
      )),
  mapping = aes(x = x, y = SPEI)
) +
  geom_rect(
    data = spei_bands,
    aes(
      xmin = -Inf, xmax = Inf,
      ymin = ymin, ymax = ymax,
      fill = label
    ),
    inherit.aes = FALSE,
    alpha = 0.25
  ) +
  geom_line() +
  scale_fill_manual(
    name = "Climate condition",
    values = setNames(spei_bands$fill, spei_bands$label)
  ) +
  labs(x = NULL) +
  facet_wrap(~ scale) +
  theme_minimal()
Figure 5.4: SPEI for a cell in New Zealand, depending on the accumulation periods (\(k\)). Shaded zones represent climatic conditions by SPEI range

For quarterly aggregation, we simply compute a mean of the SPEI values within each quater in each cell, for each year.

spei_quarterly_mean <-
  spei_by_cell |>
  mutate(
    month_date = make_date(year, month, 1),
    q_start = floor_date(month_date, "quarter"),
    quarter = quarter(q_start)
  ) |>
  group_by(cell_id, year, quarter) |>
  summarise(
    across(starts_with("SPEI_"), ~ mean(.x, na.rm = TRUE)
    ),
    .groups = "drop"
  )
Codes to create the Figure.
ggplot(
  data = spei_quarterly_mean |> 
    filter(cell_id == 19) |> 
    mutate(x = year + quarter/4) |> 
    pivot_longer(
      cols = starts_with("SPEI_"), values_to = "SPEI", names_to = "scale"
    ) |> 
    mutate(
      scale = str_remove(scale, "SPEI_"),
      scale = factor(
        scale, levels = c(1, 3, 6, 12),
        labels = paste0("k=", c(1, 3, 6, 12))
      )),
  mapping = aes(x = x, y = SPEI)
) +
  geom_rect(
    data = spei_bands,
    aes(
      xmin = -Inf, xmax = Inf,
      ymin = ymin, ymax = ymax,
      fill = label
    ),
    inherit.aes = FALSE,
    alpha = 0.25
  ) +
  geom_line() +
  scale_fill_manual(
    name = "Climate condition",
    values = setNames(spei_bands$fill, spei_bands$label)
  ) +
  labs(x = NULL) +
  facet_wrap(~ scale) +
  theme_minimal()
Figure 5.5: Quarterly mean of SPEI for a cell in New Zealand, depending on the accumulation periods (\(k\)). Shaded zones represent climatic conditions by SPEI range

5.5 Temporal Aggregation

We now need to gather the different datasets together. As for the SMDI, we will consider a monthly and a quarterly aggregation.

5.5.1 Monthly Data

First, we compute the total amount of precipitation, the average minimum, maximum, and mean temperatures, at the cell-level, on a monthly basis.

country_weather_monthly <- 
  country_weather_daily |> 
  group_by(
    cell_id, longitude, latitude, country_code, 
    year, month, month_name
  ) |> 
  summarise(
    precip = sum(precip, na.rm = TRUE), # mm/month
    temp_min = mean(temp_min, na.rm = TRUE), # deg C
    temp_max = mean(temp_max, na.rm = TRUE), # deg C
    temp_mean = mean(temp_mean,   na.rm = TRUE), # deg C
    .groups = "drop"
  )

We add the SMDI and the SPEI values to that dataset:

country_weather_monthly <- 
  country_weather_monthly |> 
  left_join(
    smdi_monthly,
    by = c("cell_id", "year", "month")
  ) |> 
  left_join(
    spei_by_cell,
    by = c("cell_id", "year", "month")
  )

5.5.2 Quarterly Data

We compute the total amount of precipitation, the average minimum, maximum, and mean temperatures, at the cell-level, on a quarterly basis.

country_weather_quarterly <- 
  country_weather_daily |> 
  mutate(
    q_start = floor_date(date, "quarter"),
    quarter = quarter(q_start),
    .after = year
    ) |> 
  select(-q_start) |> 
  group_by(
    cell_id, longitude, latitude, country_code, 
    year, quarter
  ) |> 
  summarise(
    precip = sum(precip, na.rm = TRUE), # mm/month
    temp_min = mean(temp_min, na.rm = TRUE), # deg C
    temp_max = mean(temp_max, na.rm = TRUE), # deg C
    temp_mean = mean(temp_mean,   na.rm = TRUE), # deg C
    .groups = "drop"
  )

We add the SMDI and the SPEI values to that dataset:

country_weather_quarterly <- 
  country_weather_quarterly |> 
  left_join(
    smdi_quarterly,
    by = c("cell_id", "year", "quarter")
  ) |> 
  left_join(
    spei_quarterly_mean,
    by = c("cell_id", "year", "quarter")
  )

5.6 Spatial Aggregation

We now have monthly and quarterly weather observations computed at the grid cell level. Our next step is to aggregate these data spatially to match the level of available macroeconomic indicators, which are typically reported at the regional or national scale.

5.6.1 Regional Level

For the regional aggregation, we compute a weighted average of the cell-level values, using as weights the proportion of each cell’s area lying within the corresponding region.

We need the maps of the country of interest. We use the st_shift_longitude() here because New Zealand’s boundaries have values straddling 180 degrees.

maps_level_0_NZ <- st_shift_longitude(maps_level_0$NZL)
maps_level_1_NZ <- st_shift_longitude(maps_level_1$NZL)

The coordinates of the cells were obtained in Chapter 4.

load("../data/Weather/CPC/NZL_cells.RData")
cells_sf <- st_as_sf(cells)

We compute the intersection of each cell with each region:

inter <- st_intersection(
  maps_level_1_NZ |> select(GID_1, region = NAME_1), 
  cells_sf |> select(cell_id, longitude, latitude)
)
Warning: attribute variables are assumed to be spatially constant throughout
all geometries

The proportion of each cell’s area lying within a region is used as weight for the regional aggregation. Let us compute the weights.

inter_tbl <- inter |>
  mutate(
    area_km2 = as.numeric(units::set_units(st_area(geometry), km^2))
  ) |>
  st_drop_geometry() |>
  arrange(region, cell_id) |> 
  as_tibble() |> 
  group_by(region) |> 
  mutate(w_cell_region = area_km2 / sum(area_km2)) |> 
  ungroup()
Note

Note that a cell can lie in multiple regions. The weight may probably differ.

inter_tbl |> count(cell_id) |> arrange(desc(n))
# A tibble: 181 × 2
   cell_id     n
     <int> <int>
 1      43     3
 2      72     3
 3      73     3
 4     116     3
 5     127     3
 6     142     3
 7     190     3
 8     227     3
 9     235     3
10     241     3
# ℹ 171 more rows
# Example of a cell in multiple regions:
inter_tbl |> filter(
  cell_id == inter_tbl |> count(cell_id) |> 
    arrange(desc(n)) |> pull("cell_id") |> pluck(1)
)
# A tibble: 3 × 7
  GID_1    region     cell_id longitude latitude area_km2 w_cell_region
  <chr>    <chr>        <int>     <dbl>    <dbl>    <dbl>         <dbl>
1 NZL.12_1 Otago           43      168.    -44.2     11.9      0.000410
2 NZL.14_1 Southland       43      168.    -44.2    653.       0.0207  
3 NZL.19_1 West Coast      43      168.    -44.2    879.       0.0377  

We will compute the weighted mean of the following weather variables (this needs to be updated if more weather metrics are computed):

# The names of the SPEI variables
spei_names <- colnames(country_weather_monthly)[str_detect(colnames(country_weather_monthly), "^SPEI_")]

# All the weather variables previously defined:
weather_vars <- c(
  "precip", 
  "temp_min", "temp_max", "temp_mean",
  "SMDI", 
  spei_names
)

5.6.1.1 Monthly Data

We merge the regional weights with the monthly weather data available at the cell level. During this join, cells intersecting multiple regions appear multiple times in the resulting dataset (once per overlapping region) with their contribution differentiated by the corresponding intersection weight.

region_weather_monthly <- 
  country_weather_monthly |> 
  left_join(
    inter_tbl |> select(GID_1, region, cell_id, w_cell_region),
    by = c("cell_id"),
    relationship ="many-to-many" # a cell can be in multiple regions
  ) |> 
  group_by(GID_1, region, country_code, year, month, month_name) |> 
  summarise(
    across(
      all_of(weather_vars),
      ~ weighted.mean(.x, w = w_cell_region)
    ),
    .groups = "drop"
  )

5.6.1.2 Quarterly Data

We perform the same operation with the quarterly data.

region_weather_quarterly <- 
  country_weather_quarterly |> 
  left_join(
    inter_tbl |> select(GID_1, region, cell_id, w_cell_region),
    by = c("cell_id"),
    relationship ="many-to-many" # a cell can be in multiple regions
  ) |> 
  group_by(GID_1, region, country_code, year, quarter) |> 
  summarise(
    across(
      all_of(weather_vars),
      ~ weighted.mean(.x, w = w_cell_region)
    ),
    .groups = "drop"
  )

5.6.1.3 Maps for Monthly Data

We visualize the aggregated weather by plotting choropleth maps for year 2013. For each month (12 panels), we produce:

  1. Maps at the regional level using area-weighted regional means,
  2. Maps at the cell level using cell means.

To be able to compare the values across months, we construct one map layer per month and then bind them into a single sf object for faceting.

# For region layers
data_map <- vector(mode = "list", length = 12)
for (i in 1:12) {
  data_map[[i]] <- 
    maps_level_1_NZ |> 
    inner_join(
      region_weather_monthly |> filter(year == 2013, month == !!i),
      by = c("GID_1", "NAME_1" = "region")
    )
}
data_map <- list_rbind(data_map) |> st_as_sf()

# For cell layers
data_map_cells <- vector(mode = "list", length = 12)
for (i in 1:12) {
  data_map_cells[[i]] <- 
    cells_sf |> 
    inner_join(
      country_weather_monthly |> filter(year == 2013, month == !!i),
      by = c("cell_id", "longitude", "latitude")
    )
}
data_map_cells <- list_rbind(data_map_cells) |> st_as_sf()

To ensure visual comparability across maps, we use a common color scale for all temperature variables. The code below computes the global minimum and maximum values across both minimum and maximum temperatures and defines a shared color palette that will be applied to every temperature map.

vmin <- min(data_map$temp_min, data_map$temp_max, na.rm = TRUE)
vmax <- max(data_map$temp_min, data_map$temp_max, na.rm = TRUE)
cols <- viridis::magma(6) |> rev()
cols <- cols[-length(cols)]

For the Soil Moisture Deficit Index, we define a diverging palette, with warm colors indicating dry conditions and cool colors indicating wet conditions. Neutral shades around white correspond to near-normal moisture.

cols_smdi <- c(
  "#67001f", "#b2182b", "#d6604d", "#f4a582", # dry shades
  "#f7f7f7",
  "#92c5de", "#4393c3", "#2166ac", "#053061" # wet shades
)

For the maps displaying cell-level data, we ensure spatial alignment with the regional maps by applying the same bounding box.

bb <- sf::st_bbox(data_map)
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map,
    mapping = aes(fill = precip)
  ) +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradient2(
    "Precip. (mm)", low = "red", high = "#005A8B", midpoint = 0, mid = "white"
  ) +
  theme(legend.position = "bottom")
Figure 5.6: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map_cells,
    mapping = aes(fill = precip)
  ) +
  geom_sf(data = maps_level_1_NZ, fill = "NA") +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradient2(
    "Precip. (mm)", low = "red", high = "#005A8B", midpoint = 0, mid = "white"
  ) +
  coord_sf(
    xlim = c(bb["xmin"], bb["xmax"]),
    ylim = c(bb["ymin"], bb["ymax"]),
    expand = FALSE
  ) +
  theme(legend.position = "bottom")
Figure 5.7: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map,
    mapping = aes(fill = temp_mean)
  ) +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "Temperature Mean (°C)",
    colours = cols,
    limits = c(vmin, vmax),
    oob = scales::squish
  ) +
  theme(legend.position = "bottom")
Figure 5.8: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map_cells,
    mapping = aes(fill = temp_mean)
  ) +
  geom_sf(data = maps_level_1_NZ, fill = "NA") +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "Temperature Mean (°C)",
    colours = cols,
    limits = c(vmin, vmax),
    oob = scales::squish
  ) +
  coord_sf(
    xlim = c(bb["xmin"], bb["xmax"]),
    ylim = c(bb["ymin"], bb["ymax"]),
    expand = FALSE
  ) +
  theme(legend.position = "bottom")
Figure 5.9: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map,
    mapping = aes(fill = temp_min)
  ) +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "Temperature Min (°C)",
    colours = cols,
    limits = c(vmin, vmax),
    oob = scales::squish
  ) +
  theme(legend.position = "bottom")
Figure 5.10: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map_cells,
    mapping = aes(fill = temp_min)
  ) +
  geom_sf(data = maps_level_1_NZ, fill = "NA") +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "Temperature Min (°C)",
    colours = cols,
    limits = c(vmin, vmax),
    oob = scales::squish
  ) +
  coord_sf(
    xlim = c(bb["xmin"], bb["xmax"]),
    ylim = c(bb["ymin"], bb["ymax"]),
    expand = FALSE
  ) +
  theme(legend.position = "bottom")
Figure 5.11: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map,
    mapping = aes(fill = temp_max)
  ) +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "Temperature Max (°C)",
    colours = cols,
    limits = c(vmin, vmax),
    oob = scales::squish
  ) +
  theme(legend.position = "bottom")
Figure 5.12: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map_cells,
    mapping = aes(fill = temp_max)
  ) +
  geom_sf(data = maps_level_1_NZ, fill = "NA") +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "Temperature Max (°C)",
    colours = cols,
    limits = c(vmin, vmax),
    oob = scales::squish
  ) +
  coord_sf(
    xlim = c(bb["xmin"], bb["xmax"]),
    ylim = c(bb["ymin"], bb["ymax"]),
    expand = FALSE
  ) +
  theme(legend.position = "bottom")
Figure 5.13: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map,
    mapping = aes(fill = SMDI)
  ) +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "SMDI",
    colours = cols_smdi,
    limits = c(-4, 4),
    oob = scales::squish
  ) +
  theme(legend.position = "bottom")
Figure 5.14: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map_cells,
    mapping = aes(fill = SMDI)
  ) +
  geom_sf(data = maps_level_1_NZ, fill = "NA") +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "SMDI",
    colours = cols_smdi,
    limits = c(-4, 4),
    oob = scales::squish
  ) +
  coord_sf(
    xlim = c(bb["xmin"], bb["xmax"]),
    ylim = c(bb["ymin"], bb["ymax"]),
    expand = FALSE
  ) +
  theme(legend.position = "bottom")
Figure 5.15: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map,
    mapping = aes(fill = SPEI_3)
  ) +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "SPEI-3",
    colours = cols_smdi,
    limits = c(-4, 4),
    oob = scales::squish
  ) +
  theme(legend.position = "bottom")
Figure 5.16: Monthly averages in 2013
Codes to create the Figure
ggplot() +
  geom_sf(
    data = data_map_cells,
    mapping = aes(fill = SPEI_3)
  ) +
  geom_sf(data = maps_level_1_NZ, fill = "NA") +
  facet_wrap(~ month_name, ncol = 6) +
  theme_map_paper() +
  scale_fill_gradientn(
    name = "SPEI-3",
    colours = cols_smdi,
    limits = c(-4, 4),
    oob = scales::squish
  ) +
  coord_sf(
    xlim = c(bb["xmin"], bb["xmax"]),
    ylim = c(bb["ymin"], bb["ymax"]),
    expand = FALSE
  ) +
  theme(legend.position = "bottom")
Figure 5.17: Monthly averages in 2013

5.6.1.4 Maps for Quarterly Data

We can also do the same with the quarterly aggregated values.

# For region layers
data_map_q <- vector(mode = "list", length = 4)
for (i in 1:4) {
  data_map_q[[i]] <- 
    maps_level_1_NZ |> 
    inner_join(
      region_weather_quarterly |> filter(year == 2013, quarter == !!i),
      by = c("GID_1", "NAME_1" = "region")
    )
}
data_map_q <- list_rbind(data_map_q) |> st_as_sf()

# For cell layers
data_map_q_cells <- vector(mode = "list", length = 4)
for (i in 1:4) {
  data_map_q_cells[[i]] <- 
    cells_sf |> 
    inner_join(
      country_weather_quarterly |> filter(year == 2013, quarter == !!i),
      by = c("cell_id", "longitude", "latitude")
    )
}
data_map_q_cells <- list_rbind(data_map_q_cells) |> st_as_sf()

This time, we define a small function to create the plots, plot_maps_q(). There is no real point to do so, we just want to provide a second alternative.

The plot_maps_q() function.
#' Maps for each month for a variable showing temperatures
plot_maps_q <- function(var_temp_name, 
                        legend_title,
                        geo = c("cell", "region"),
                        legend_type = c("precip", "temp", "drought")) {
  geo <- match.arg(geo)
  legend_type <- match.arg(legend_type)
  df_plot <- if (geo == "cell") data_map_q_cells else data_map_q
  
  p <- ggplot() +
    geom_sf(
      data = df_plot,
      mapping = aes(fill = !!sym(var_temp_name))
    ) +
    facet_wrap(~ quarter, ncol = 4) +
    theme_map_paper() +
    theme(legend.position = "bottom", legend.key.height = unit(1, "line"))
  
  p <- switch(
    legend_type,
    "temp" = p + scale_fill_gradientn(
      name   = legend_title,
      colours = cols,
      limits  = c(vmin, vmax),
      oob     = scales::squish
    ),
    "precip" = p + scale_fill_gradient2(
      name = legend_title,
      low = "red", high = "#005A8B", midpoint = 0, mid = "white"
    ),
    "drought" = p + scale_fill_gradientn(
      name   = legend_title,
      colours = cols_smdi,
      limits  = c(-4, 4),
      oob     = scales::squish
    )
  )
  
  if (geo == "cell") {
    p <- p +
      geom_sf(data = maps_level_1_NZ, fill = "NA") +
      coord_sf(
        xlim = c(bb["xmin"], bb["xmax"]),
        ylim = c(bb["ymin"], bb["ymax"]),
        expand = FALSE
      )
  }
  
  p
}

Having this function would allow to quickly loop over the different variables and plot maps. We do not do it in this notebook.

# This chunk is not evaluated

configs_maps <- tribble(
  ~variable, ~title, ~type,
  "precip", "Precip. (mm)", "precip",
  "temp_mean", "Temperature Mean (°C)", "temp",
  "temp_min", "Temperature Min (°C)", "temp",
  "temp_max", "Temperature Max (°C)", "temp",
  "SMDI_month", "SMDI", "drought",
  "SPEI_3", "SPEI-3", "drought",
) |> 
  expand_grid(geo = c("region", "cell"))


for (i in 1:nrow(configs_maps)) {
  plot_maps_q(
    var_temp_name = configs_maps$variable[i], 
    legend_title = configs_maps$title[i], 
    geo = configs_maps$geo[i], 
    legend_type = configs_maps$type[i]
  )
}
Codes to create the Figure
plot_maps_q(
    var_temp_name = "precip",
    legend_title = "Precip. (mm)",
    geo = "region", 
    legend_type = "precip"
)
Figure 5.18: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "precip",
    legend_title = "Precip. (mm)",
    geo = "cell", 
    legend_type = "precip"
)
Figure 5.19: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "temp_mean",
    legend_title = "Temp Mean (°C)",
    geo = "region", 
    legend_type = "temp"
)
Figure 5.20: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "temp_mean",
    legend_title = "Temp Mean (°C)",
    geo = "cell", 
    legend_type = "temp"
)
Figure 5.21: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "temp_min",
    legend_title = "Temp Min (°C)",
    geo = "region", 
    legend_type = "temp"
)
Figure 5.22: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "temp_min",
    legend_title = "Temp Min (°C)",
    geo = "cell", 
    legend_type = "temp"
)
Figure 5.23: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "temp_max",
    legend_title = "Temp Max (°C)",
    geo = "region", 
    legend_type = "temp"
)
Figure 5.24: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "temp_max",
    legend_title = "Temp Max (°C)",
    geo = "cell", 
    legend_type = "temp"
)
Figure 5.25: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "SMDI",
    legend_title = "SMDI",
    geo = "region", 
    legend_type = "drought"
)
Figure 5.26: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "SMDI",
    legend_title = "SMDI",
    geo = "cell", 
    legend_type = "drought"
)
Figure 5.27: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "SPEI_3",
    legend_title = "SPEI-3",
    geo = "region", 
    legend_type = "drought"
)
Figure 5.28: Quarterly averages in 2013
Codes to create the Figure
plot_maps_q(
    var_temp_name = "SPEI_3",
    legend_title = "SPEI-3",
    geo = "cell", 
    legend_type = "drought"
)
Figure 5.29: Quarterly averages in 2013

5.6.2 National Level

In our context, we assume that weather shocks primarily affect the agricultural sector, before propagating to the broader economy. Consequently, when aggregating weather variables at the national level, we apply weights that capture the relative agricultural intensity of each region. By doing so, areas where agriculture plays a larger economic role contribute more to the national weather indices.

Important

If one wishes to perform the aggregation in a different context, these weights should be adapted accordingly. Setting all weights to 1 would correspond to a simple unweighted average, implicitly assuming that each grid cell contributes equally to national exposure. Such an assumption may overlook important spatial heterogeneity in sectoral relevance.

5.6.2.1 Regional Agricultural Intensity

To aggregate grid-cell weather measures to the national level in a way that reflects sectoral exposure, we build regional agricultural weights from regional accounts.

Statistics New Zealand’s regional GDP by industry (series RNA001AA) is downloaded from Infoshare (CSV export; the Excel export uses locale-dependent thousand separators that break on our French setup). Our extract covers 2000–2023 by region. We target a working sample of 1987–2024, so we must extrapolate outside the extract window: 1987–1999 (backward) and 2024 (forward).

5.6.2.1.1 Extrapolation strategy

Let \(Y^{(a)}_{r,t}\) denote agricultural GDP for region \(r\) and year \(t\).

We compute the inverse growth rate: \[ \Delta_{r,t}^{-1} = \frac{Y^{(a)}_{r,t}}{Y^{(a)}_{r,t+1}}, \tag{5.21}\] defined wherever both years \(t\) and \(t+1\) are observed.

Over a reference window of width equal to the number of observed years minus 1 (because the computation of \(\Delta_{r,t}^{-1}\) make us lose an observation), hence corresponding to the observed period, we estimate a linear trend in \(\Delta_{r,t}^{-1}\): \[ \Delta_{r,t}^{-1} = \alpha_r + \beta_r \cdot t + \varepsilon_{r,t}, \tag{5.22}\] and retain \(\hat\beta_r\).

For backward fills (years preceding the extract), we form a geometrically weighted average of the windowed inverse growth rates and blend it 50/50 with the trend-implied growth: \[ \widehat{\gamma}_{r,t}^{\text{back}} = \tfrac{1}{2}\Big(\sum_{j=1}^{W} w_j \Delta_{r,t+j}^{-1}\Big) + \tfrac{1}{2}\big(1 + \hat\beta_r\big), \tag{5.23}\] where \(w_j \propto \text{reason}^{j-1}\) with \(\text{reason}=1/1.35\) and \(\sum_j w_j=1\).

We then impute \[ Y^{(a)}_{r,t} = \widehat{\gamma}_{r,t}^{\text{back}} \times \Delta_{r,t+1}^{-1}. \tag{5.24}\]

For forward fills (years after the extract), we apply the trend growth: \[ \Delta_{r,t}^{-1} = \Delta_{r,t-1}^{-1} \times \big(1+\hat\beta_r\big). \tag{5.25}\]

This approach borrows strength from the local history (via the weighted window) while guarding against short-run noise (via the trend).

5.6.2.1.2 Weight construction

Let \(Y^{(a)}_{t}\) be the national agricultural GDP. For each region \(r\) and year \(t\), we define the agricultural intensity weight as follows:: \[ w_{r,t} = \frac{Y^{(a)}_{r,t}}{Y^{(a)}_{t}}. \tag{5.26}\] By construction, \(\sum_{r} w_{r,t} = 1\) for each year \(t\) (up to rounding). These weights can then be used to aggregate weather variables from regions to the national level..

Important

Other methods to estimate the regional agricultural intensity can be used. We just present this one.

5.6.2.1.3 Implementation in R

Let us compute these weights in R now. We first load the dataset:

# Source: https://infoshare.stats.govt.nz
# Ref: RNA001AA
agri <- read_csv(
  "../data/Agriculture/RNA434201_20251025_084751_31.csv", 
  skip = 4, 
  col_names = colnames(
    read_csv("../data/Agriculture/RNA434201_20251025_084751_31.csv", 
             skip = 2, n_max = 1, col_types = cols())),
  n_max = 24
) |> 
  rename(year = `...1`)
New names:
Rows: 24 Columns: 19
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," dbl
(19): ...1, Northland, Auckland, Waikato, Bay of Plenty, Gisborne, Hawke...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
  # filter(year <= 2012) # same as in the paper
agri
# A tibble: 24 × 19
    year Northland Auckland Waikato `Bay of Plenty` Gisborne `Hawke's Bay`
   <dbl>     <dbl>    <dbl>   <dbl>           <dbl>    <dbl>         <dbl>
 1  2000       327      182     998             412       99           357
 2  2001       459      216    1482             525      118           489
 3  2002       510      235    1681             597      140           530
 4  2003       334      170    1083             463      117           472
 5  2004       372      175    1185             521      123           498
 6  2005       356      165    1201             500      114           479
 7  2006       276      147    1014             447      123           407
 8  2007       336      165    1207             503      137           438
 9  2008       512      246    1951             718      134           410
10  2009       368      178    1360             535      138           479
# ℹ 14 more rows
# ℹ 12 more variables: Taranaki <dbl>, `Manawatu-Whanganui` <dbl>,
#   Wellington <dbl>, `West Coast` <dbl>, Canterbury <dbl>, Otago <dbl>,
#   Southland <dbl>, Marlborough <dbl>, `Tasman/Nelson` <dbl>,
#   `Total North Island` <dbl>, `Total South Island` <dbl>, `New Zealand` <dbl>

Each column in the table gives the agricultural GDP of a region. For convenience, we pivot the table so that the GDP values are in a single column (that we name gdp_a); another column, region, will indicate the region the values refer to. Once we have pivoted the table, we compute the inverted growth rate for each region.

agri <- agri |> 
  pivot_longer(cols = -year, names_to = "region", values_to = "gdp_a") |> 
  arrange(region, year) |> 
  group_by(region) |> 
  mutate(
    inv_growth_rate = gdp_a / lead(gdp_a)
  ) |> 
  ungroup()

We set the periods with data and thos for which we want to extrapolate values.

years_to_complete_back <- 1999:1987
# years_to_complete_forw <- 2013:2025 # Same as in the paper
years_to_complete_forw <- 2024:2025

The width of the window on which the trend will be learned:

width <- length(
  seq(max(years_to_complete_back) + 1, min(years_to_complete_forw) - 2)
)
width
[1] 23

The years of the reference period for the linear model:

years_reference <- seq(years_to_complete_back[1] + 1, length.out = width)
years_reference
 [1] 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
[16] 2015 2016 2017 2018 2019 2020 2021 2022

And we extend the dataset with the desired missing years:

agri <- crossing(
  year = seq(min(years_to_complete_back), max(years_to_complete_forw)), 
  region = unique(agri$region)) |> 
  full_join(
    agri, by = c("year", "region")
  )
agri
# A tibble: 702 × 4
    year region             gdp_a inv_growth_rate
   <dbl> <chr>              <dbl>           <dbl>
 1  1987 Auckland              NA              NA
 2  1987 Bay of Plenty         NA              NA
 3  1987 Canterbury            NA              NA
 4  1987 Gisborne              NA              NA
 5  1987 Hawke's Bay           NA              NA
 6  1987 Manawatu-Whanganui    NA              NA
 7  1987 Marlborough           NA              NA
 8  1987 New Zealand           NA              NA
 9  1987 Northland             NA              NA
10  1987 Otago                 NA              NA
# ℹ 692 more rows

For each region \(r\), we regress the inverse growth rate \(\Delta_{r,t}^{-1}\) on a linear trend (see Equation 5.22) and return the coefficient associated with the trend, \(\hat\beta_r\):

# Regression of agricultural production on the ref. period on a linear trend
# Extraction of the coefficient
coefs <- agri |> 
  filter(year %in% years_reference) |> 
  group_by(region) |> 
  mutate(t = row_number()) |> 
  group_modify(~ {
    model <- lm(inv_growth_rate ~ t, data = .x)
    tibble(coef_t = coef(model)[["t"]])
  })

As explained in Equation 5.23, we set some weights \(w_j\) to compute a geometrically weighted average of the inverse growth rate.

# Geometric sequence for weigthing scheme
reason <- 1 / 1.35
pond <- reason^(0:(width - 1))
pond <- pond / sum(pond)

Then, we loop on each region \(r\) to impute the missing values for agricultural GDP \(Y^{(a)}_{r,t}\).

for (region in unique(agri$region)) {
  # Focus on curent region
  coef_region <- coefs |> filter(region == !!region) |> pull(coef_t)
  
  ## Predict backward ----
  for (i in 1:length(years_to_complete_back)) {
    year <- years_to_complete_back[i]
    ind_row <- which(agri$year == year & agri$region == region)
    ind_row_next <- which(agri$year == year+1 & agri$region == region)
    window <- seq(year + 1, year + width)
    agri_region_w <- agri |> filter(region == !!region, year %in% !!window)
    
    # Agricultural GDP in the next year
    gdp_a_next <- agri$gdp_a[ind_row_next]
    
    # Complete missing values
    agri$gdp_a[ind_row] <- 
      ((.5*(pond %*% agri_region_w$inv_growth_rate)) + (.5*(1+coef_region))) *
      gdp_a_next
    
    agri$inv_growth_rate[ind_row] <- 
      agri$gdp_a[ind_row] / agri$gdp_a[ind_row_next]
  }
  
  ## Predict forward ----
  for (i in 1:length(years_to_complete_forw)) {
    year <- years_to_complete_forw[i]
    ind_row <- which(agri$year == year & agri$region == region)
    ind_row_prev <- which(agri$year == year-1 & agri$region == region)
    # Previous agricultural GDP
    gdp_a_previous <- agri$gdp_a[ind_row_prev]
    # Prediction assuming the linear trend
    agri$gdp_a[ind_row] <- gdp_a_previous * (1+coef_region)
  }
}

We can compute the regional weights \(w_{r,t}\) (Equation 5.26).

# Expressing as a share of total national value
tb_agri_shares <- agri |> 
  filter(!region == "Nez Zealand") |> 
  group_by(year) |> 
  mutate(agri_share_region = gdp_a / sum(gdp_a))
tb_agri_shares |> select(year, region, gdp_a, agri_share_region)
# A tibble: 702 × 4
# Groups:   year [39]
    year region              gdp_a agri_share_region
   <dbl> <chr>               <dbl>             <dbl>
 1  1987 Auckland            180.            0.0133 
 2  1987 Bay of Plenty       350.            0.0259 
 3  1987 Canterbury          605.            0.0448 
 4  1987 Gisborne             84.6           0.00626
 5  1987 Hawke's Bay         291.            0.0215 
 6  1987 Manawatu-Whanganui  446.            0.0330 
 7  1987 Marlborough          73.5           0.00544
 8  1987 New Zealand        4470.            0.331  
 9  1987 Northland           303.            0.0224 
10  1987 Otago               289.            0.0214 
# ℹ 692 more rows

Let us save this table of weights:

save(tb_agri_shares, file = "../data/Agriculture/tb_agri_shares.rda")

5.6.2.2 Matching Regional Agricultural Intensity and Weather Data

5.6.2.2.1 Quick Fix on Region Names

Region names often differ across sources. Here, we need to align agricultural intensity (from a national source) with weather data whose regions follow GADM names.

First, we list weather regions that do not appear in the agricultural data:

unique(region_weather_monthly$region)[! unique(region_weather_monthly$region) %in% unique(tb_agri_shares$region)] |> 
  cat(sep = "\n")
Tasman
Manawatu-Wanganui
Nelson

Agricultural table region names:

sort(unique(tb_agri_shares$region))
 [1] "Auckland"           "Bay of Plenty"      "Canterbury"        
 [4] "Gisborne"           "Hawke's Bay"        "Manawatu-Whanganui"
 [7] "Marlborough"        "New Zealand"        "Northland"         
[10] "Otago"              "Southland"          "Taranaki"          
[13] "Tasman/Nelson"      "Total North Island" "Total South Island"
[16] "Waikato"            "Wellington"         "West Coast"        

We notice two types of issues here:

  1. Spelling differences (easy to correct): Manawatu-Whanganui vs. Manawatu-Wanganui
  2. Different spatial aggregation: Tasman and Nelson are merged into a single region in the agricultural data (see Figure 5.30).

For the first issue, the fix is very easy, we just need to adjust the spelling.

tb_agri_shares <- tb_agri_shares |> 
  mutate(
    region = if_else(
      region == "Manawatu-Whanganui",
      "Manawatu-Wanganui",
      region
    )
  )

For the second issue, let us first visualize the Tasman/Nelson regions.

Codes to create the Figure.
ggplot(
  data = maps_level_1_NZ |> 
    mutate(
      issue = case_when(
        NAME_1 == "Tasman" ~ "Tasman",
        NAME_1 == "Nelson" ~ "Nelson",
        TRUE ~ "Other"
      ),
      issue = factor(issue, levels = c("Tasman", "Nelson", "Other"))
    ),
  mapping = aes(fill = issue)
) + 
  geom_sf(colour = "white") +
  scale_fill_manual(
    "Regions",
    values = c("Tasman" = "#0072B2", "Nelson" = "#D55E00", "Other" = "gray")
  )
Figure 5.30: Regions merged in the agricultural data.

Tasman is much larger than Nelson, so we split the combined agricultural share proportionally to their land areas (not 50/50). We compute areas from the map.

# Areas of each region
nz_areas <- maps_level_1_NZ |>
  filter(NAME_1 %in% c("Tasman", "Nelson")) |>
  mutate(area_m2 = st_area(geometry)) |>
  st_drop_geometry() |>
  mutate(area_m2 = as.numeric(area_m2)) |>
  select(region = NAME_1, area_m2)
nz_areas
  region    area_m2
1 Nelson  419763011
2 Tasman 9631754767

The corresponding weight:

tas_nel_weights <- nz_areas |>
  mutate(w = area_m2 / sum(area_m2)) |>
  select(region, w)
tas_nel_weights
  region          w
1 Nelson 0.04176116
2 Tasman 0.95823884

Now we duplicate each "Tasman/Nelson" row into two rows ("Tasman" and Nelson) and allocate gdp_a and agri_share_region using the weights:

group_label <- "Tasman/Nelson"

tb_agri_shares <-
  # keep all rows that are not the Tasman + Nelson aggregate
  tb_agri_shares |> filter(region != !!group_label) |> 
  bind_rows(
    # duplicate the grouped rows for Tasman and Nelson and split the share
    tb_agri_shares |>
      filter(region == !!group_label) |>
      # create two copies per row, one for each region
      crossing(region_2 = c("Tasman", "Nelson")) |>
      select(-region) |> 
      rename(region = region_2) |> 
      # add area weights
      left_join(tas_nel_weights, by = "region") |>
      # apportion only the agri_share_region (as asked)
      mutate(
        gdp_a = gdp_a * w,
        agri_share_region = agri_share_region * w) |>
      select(-w)
  ) |>
  arrange(region, year)

As a sanity check, we look whether weather regions are still missing from the agricultural table:

unique(region_weather_monthly$region)[! unique(region_weather_monthly$region) %in% unique(tb_agri_shares$region)] |> 
  cat(sep = "\n")

An we look which agricultural regions are not present in weather data (these are country- and island-level aggregates, which is fine):

unique(tb_agri_shares$region)[! unique(tb_agri_shares$region) %in% unique(region_weather_monthly$region)] |> 
  cat(sep = "\n")
New Zealand
Total North Island
Total South Island
5.6.2.2.2 Merge

To obtain national-level indicators, we aggregate regional weather observations using agricultural intensity as weights. We merge the regional weather data with the corresponding agricultural weights (from agri_share_region) for each year and compute the weighted mean of each weather variable.

For monthly data, we group by year and month:

national_weather_monthly <- 
  region_weather_monthly |> 
  # We restricted the agricultural data to 1987--2024
  filter(year >= 1987) |> 
  left_join(
    tb_agri_shares |> select(year, region, agri_share_region),
    by = c("region", "year")
  ) |> 
  group_by(year, month, month_name) |> 
  summarise(
    across(
      all_of(weather_vars),
      ~ weighted.mean(.x, w = agri_share_region)
    ),
    .groups = "drop"
  )

For quarterly data, the same logic applies, except we aggregate by year and quarter:

national_weather_quarterly <- 
  region_weather_quarterly |> 
  # We restricted the agricultural data to 1987--2024
  filter(year >= 1987) |> 
  left_join(
    tb_agri_shares |> select(year, region, agri_share_region),
    by = c("region", "year")
  ) |> 
  group_by(year, quarter) |> 
  summarise(
    across(
      all_of(weather_vars),
      ~ weighted.mean(.x, w = agri_share_region)
    ),
    .groups = "drop"
  )

We can have a look at the obtained values using different type of plots.

Codes to create the Figure.
ggplot(
  data = national_weather_monthly |> 
    mutate(date = make_date(year, month)),
  mapping = aes(x = date, y = precip)) +
  geom_line() +
  labs(x = NULL, y = "Precipitation (mm)") +
  theme_paper()
Figure 5.31: Precipitation in New Zealand on a monthly basis.

With the quarterly national aggregation:

Codes to create the Figure.
ggplot(
  data = national_weather_quarterly |> 
    mutate(date = year + quarter/4),
  mapping = aes(x = date, y = precip)) +
  geom_line() +
  labs(x = NULL, y = "Precipitation (mm)") +
  theme_paper()
Figure 5.32: Precipitation in New Zealand on a quarterly basis.

If one wants to highlight a specific year, e.g., 2013:

Codes to create the Figure.
ggplot(
  data = national_weather_monthly |> 
    mutate(
      month_abb = factor(month.abb[month], levels = month.abb),
      highlight_year = case_when(
        year == 2013 ~ "2013",
        TRUE ~ "Other"
      )
    ),
  mapping = aes(x = month_abb, y = precip)) +
  geom_line(
    mapping = aes(group = year, colour = highlight_year)
  ) +
  scale_colour_manual(
    NULL, values = c("2013" = "#D55E00", "Other" = "gray")
  ) +
  labs(x = NULL, y = "Precipitation (mm)") +
  theme_paper()
Figure 5.33: Precipitation in New Zealand on a monthly basis (1987–2024).

We can use barplots also and show the 1-SD range across years with error bars.

Codes to create the Figure.
ggplot(
  data = national_weather_monthly |> 
    mutate(
      month_abb = factor(month.abb[month], levels = month.abb)
    ) |> 
    group_by(month_abb) |> 
    summarise(
      precip_mean = mean(precip),
      precip_sd = sd(precip)
    )
) +
  geom_bar(
    mapping = aes(x = month_abb, y = precip_mean),
    stat="identity",
    fill="#5482AB", alpha=0.7
  ) +
  geom_errorbar(
    mapping = aes(
      x = month_abb,
      ymin = precip_mean - precip_sd,
      ymax = precip_mean + precip_sd
    ),
    width = 0.4, colour = "#005A8B", alpha = 0.9, linewidth = 1.3
  ) +
  labs(x = NULL, y = "Precipitation (mm)") +
  theme_paper()
Figure 5.34: Barplot of precipitation in New Zealand on a monthly basis (1987–2024).

Obviously, we are not restricted to plots showing temperatures. Let us visualize the evolution of the SMDI between 1987 and 2024, on a monthly basis. Note that the spatial aggregation has flattened the series (recall the monthly values before spatial aggregation range between -4 (very dry) to +4 (very wet)).

Codes to create the Figure.
ggplot(
  data = national_weather_monthly |> 
    mutate(date = make_date(year, month)),
  mapping = aes(x = date, y = SMDI)) +
  geom_line() +
  scale_x_date(date_breaks = "3 years", date_labels = "%Y") +
  labs(x = NULL, y = "SMDI") +
  theme_paper()
Figure 5.35: Soil Moisture Deficit Index in New Zealand, monthly basis.

And with the quarterly data:

Codes to create the Figure.
ggplot(
  data = national_weather_quarterly |> 
    mutate(date = year + quarter/4),
  mapping = aes(x = date, y = SMDI)) +
  geom_line() +
  scale_x_continuous(n.breaks = 12) +
  labs(x = NULL, y = "SMDI") +
  theme_paper()
Figure 5.36: Soil Moisture Deficit Index in New Zealand, quarterly basis.

5.7 Saving the Datasets

To finish, let us export the two datasets. If those datasets are to be used in R later on, they can be saved as RData files to save space and importing time.

save(
  national_weather_monthly, 
  file = "../data/Weather/national_weather_monthly.rda"
)
save(
  national_weather_quarterly, 
  file = "../data/Weather/national_weather_quarterly.rda"
)

If one wants to share the dataset for use in another software, a CSV file is also possible:

readr::write_csv(
  x = national_weather_monthly, 
  file = "../data/Weather/national_weather_monthly.csv"
)
readr::write_csv(
  x = national_weather_quarterly, 
  file = "../data/Weather/national_weather_quarterly.csv"
)
Table 5.2: Dictionary of variables for monthly precipitation (national_weather_monthly)
Variable Type Description
year numeric Year
month numeric Month
month_name ordered factor Name of the month
precip numeric Average of monthly precipitation (mm)
temp_min numeric Average of monthly min temperature (°C)
temp_max numeric Average of monthly max temperature (°C)
temp_mean numeric Average of monthly mean temperature (°C)
SMDI numeric Soil Moisture Deficit Inded (from -4, very dry to +4n very wet)
SPEI_1 numeric 1-Month Standardised Precipitation-Evapotranspiration Index with (negative values for dry, positive for wet)
SPEI_3 numeric 3-Months Standardised Precipitation-Evapotranspiration Index with (negative values for dry, positive for wet)
SPEI_6 numeric 6-Months Standardised Precipitation-Evapotranspiration Index with (negative values for dry, positive for wet)
SPEI_12 numeric 12-Months Standardised Precipitation-Evapotranspiration Index with (negative values for dry, positive for wet)
Table 5.3: Dictionary of variables for quarterly precipitation (national_weather_quarterly)
Variable Type Description
year numeric Year
quarter numeric Quarter
month_name ordered factor Name of the month
precip numeric Average of monthly precipitation (mm)
temp_min numeric Average of monthly min temperature (°C)
temp_max numeric Average of monthly max temperature (°C)
temp_mean numeric Average of monthly mean temperature (°C)
SMDI numeric Soil Moisture Deficit Inded (from -4, very dry to +4n very wet)
SPEI_1 numeric 1-Month Standardised Precipitation-Evapotranspiration Index with (negative values for dry, positive for wet)
SPEI_3 numeric 3-Months Standardised Precipitation-Evapotranspiration Index with (negative values for dry, positive for wet)
SPEI_6 numeric 6-Months Standardised Precipitation-Evapotranspiration Index with (negative values for dry, positive for wet)
SPEI_12 numeric 12-Months Standardised Precipitation-Evapotranspiration Index with (negative values for dry, positive for wet)