4  Covid-19 Data

This chapter provides codes that can be used to work with the data from Oxford University: Oxford Covid-19 Government Response Tracker (OxCGRT).

Here, we will focus on 5 large and 5 small European countries, in terms of inhabitants: - “Large countries”: United Kingdom, Spain, Italy, Germany, France, - “Small countries”: Sweden, Belgium, Netherlands, Ireland, Denmark.

4.1 Load data

First of all, let us load some packages.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(knitr)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
library(RColorBrewer)
library(growthmodels)
library(minpack.lm)
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
library(nlstools)

'nlstools' has been loaded.

IMPORTANT NOTICE: Most nonlinear regression models and data set examples
related to predictive microbiolgy have been moved to the package 'nlsMicrobio'
library(ggpubr)
library(gridExtra)

Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

Let us also define a theme for the graphical outputs, as in Chapter 1

library(grid)
theme_paper <- function(..., size_text = 8)
  theme(text = element_text(size = size_text),
        plot.background = element_rect(fill="transparent", color=NA),
        panel.background = element_rect(fill = "transparent", color=NA),
        panel.border = element_blank(),
        axis.text = element_text(), 
        legend.text = element_text(size = rel(1.1)),
        legend.title = element_text(size = rel(1.1)),
        legend.background = element_rect(fill="transparent", color=NULL),
        legend.position = "bottom", 
        legend.direction = "horizontal", legend.box = "vertical",
        legend.key = element_blank(),
        panel.spacing = unit(1, "lines"),
        panel.grid.major = element_line(colour = "grey90"), 
        panel.grid.minor = element_blank(),
        plot.title = element_text(hjust = 0, size = rel(1.3), face = "bold"),
        plot.title.position = "plot",
        plot.margin = unit(c(1, 1, 1, 1), "lines"),
        strip.background = element_rect(fill=NA, colour = NA),
        strip.text = element_text(size = rel(1.1)))

Let us modify the locale settings. This depends on the OS. For Windows users:

Sys.setlocale("LC_ALL", "English_United States")

For Unix users:

# Only for Unix users
Sys.setlocale("LC_ALL", "en_US.UTF-8")
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

4.1.1 Confirmed and Deaths data

As mentioned at the beginning of the notebook, we rely on Oxford Covid-19 Government Response Tracker (OxCGRT) data.

Let us define the vector of country names:

names_countries <- c("United Kingdom", "Spain", "Italy", "Germany", "France", 
                     "Sweden", "Belgium", "Netherlands", "Ireland", "Denmark")
names_countries_large <- c("United Kingdom", "Spain", "Italy", "Germany", "France")
names_countries_small <- c("Sweden", "Belgium", "Netherlands", "Ireland", "Denmark")

The raw data can be downloaded as follows:

df_oxford <- 
  read.csv("https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_nat_latest.csv")

Then we save those:

dir.create("data")
save(df_oxford, file = "data/df_oxford.rda")

Let us load the saved data (data saved on May 29, 2023):

The dimensions are the following:

dim(df_oxford)
[1] 202819     61

Let us focus first on the number of confirmed cases:

confirmed_df <- 
  df_oxford |> 
  select(
    country = CountryName, 
    country_code = CountryCode,
    date = Date,
    value = ConfirmedCases,
    stringency_index = StringencyIndex_Average) |> 
  filter(country %in% names_countries) |> 
  as_tibble() |> 
  mutate(
    date = ymd(date),
    days_since_2020_01_22 =
      lubridate::interval(
        lubridate::ymd("2020-01-22"), date) / lubridate::ddays(1)
  )

We can do the same for the number of deaths:

deaths_df <- 
  df_oxford |> 
  select(country = CountryName, 
         country_code = CountryCode,
         date = Date,
         value = ConfirmedDeaths, 
         stringency_index = StringencyIndex_Average) |> 
  filter(country %in% names_countries) |> 
  as_tibble() |> 
  mutate(
    date = ymd(date),
    days_since_2020_01_22 =
      lubridate::interval(
        lubridate::ymd("2020-01-22"), date) / lubridate::ddays(1)
  )

4.1.1.1 Confirmed extract

Here is an extract from the last 3 rows of the confirmed cases:

confirmed_df |> 
  group_by(country) |> 
  slice_tail(n = 3) |> 
  arrange(country, desc(date)) |> 
  kable()
Table 4.1: Last 3 rows of confirmed cases, by country
country country_code date value stringency_index days_since_2020_01_22
Belgium BEL 2022-12-31 4668248 11.11 1074
Belgium BEL 2022-12-30 4668248 11.11 1073
Belgium BEL 2022-12-29 4668248 11.11 1072
Denmark DNK 2022-12-31 3385836 11.11 1074
Denmark DNK 2022-12-30 3385836 11.11 1073
Denmark DNK 2022-12-29 3385836 11.11 1072
France FRA 2022-12-31 38266999 11.11 1074
France FRA 2022-12-30 38266999 11.11 1073
France FRA 2022-12-29 38243191 11.11 1072
Germany DEU 2022-12-31 37369866 11.11 1074
Germany DEU 2022-12-30 37369865 11.11 1073
Germany DEU 2022-12-29 37345969 11.11 1072
Ireland IRL 2022-12-31 1687668 5.56 1074
Ireland IRL 2022-12-30 1687668 5.56 1073
Ireland IRL 2022-12-29 1687668 5.56 1072
Italy ITA 2022-12-31 25143705 21.99 1074
Italy ITA 2022-12-30 25143705 21.99 1073
Italy ITA 2022-12-29 25021606 21.99 1072
Netherlands NLD 2022-12-31 8569095 11.11 1074
Netherlands NLD 2022-12-30 8569095 11.11 1073
Netherlands NLD 2022-12-29 8565716 11.11 1072
Spain ESP 2022-12-31 13684258 5.56 1074
Spain ESP 2022-12-30 13684258 5.56 1073
Spain ESP 2022-12-29 13670037 5.56 1072
Sweden SWE 2022-12-31 2674862 11.11 1074
Sweden SWE 2022-12-30 2674862 11.11 1073
Sweden SWE 2022-12-29 2674862 11.11 1072
United Kingdom GBR 2022-12-31 24135080 5.56 1074
United Kingdom GBR 2022-12-30 24135080 5.56 1073
United Kingdom GBR 2022-12-29 24135080 5.56 1072

The number of observation per country:

confirmed_df |> group_by(country) |> count()
# A tibble: 10 × 2
# Groups:   country [10]
   country            n
   <chr>          <int>
 1 Belgium         1096
 2 Denmark         1096
 3 France          1096
 4 Germany         1096
 5 Ireland         1096
 6 Italy           1096
 7 Netherlands     1096
 8 Spain           1096
 9 Sweden          1096
10 United Kingdom  1096

4.1.1.2 Deaths extract

Here is an extract from the last 3 rows of the number of deaths, for each country:

deaths_df |> 
  group_by(country) |> 
  slice_tail(n = 3) |> 
  arrange(country, desc(date)) |> 
  kable()
Table 4.2: Last 3 rows of deaths, by country
country country_code date value stringency_index days_since_2020_01_22
Belgium BEL 2022-12-31 33228 11.11 1074
Belgium BEL 2022-12-30 33228 11.11 1073
Belgium BEL 2022-12-29 33228 11.11 1072
Denmark DNK 2022-12-31 7758 11.11 1074
Denmark DNK 2022-12-30 7758 11.11 1073
Denmark DNK 2022-12-29 7758 11.11 1072
France FRA 2022-12-31 158385 11.11 1074
France FRA 2022-12-30 158385 11.11 1073
France FRA 2022-12-29 158270 11.11 1072
Germany DEU 2022-12-31 161465 11.11 1074
Germany DEU 2022-12-30 161465 11.11 1073
Germany DEU 2022-12-29 161321 11.11 1072
Ireland IRL 2022-12-31 8293 5.56 1074
Ireland IRL 2022-12-30 8293 5.56 1073
Ireland IRL 2022-12-29 8293 5.56 1072
Italy ITA 2022-12-31 184642 21.99 1074
Italy ITA 2022-12-30 184642 21.99 1073
Italy ITA 2022-12-29 183936 21.99 1072
Netherlands NLD 2022-12-31 22990 11.11 1074
Netherlands NLD 2022-12-30 22990 11.11 1073
Netherlands NLD 2022-12-29 22971 11.11 1072
Spain ESP 2022-12-31 117095 5.56 1074
Spain ESP 2022-12-30 117095 5.56 1073
Spain ESP 2022-12-29 116899 5.56 1072
Sweden SWE 2022-12-31 21827 11.11 1074
Sweden SWE 2022-12-30 21827 11.11 1073
Sweden SWE 2022-12-29 21827 11.11 1072
United Kingdom GBR 2022-12-31 216306 5.56 1074
United Kingdom GBR 2022-12-30 216155 5.56 1073
United Kingdom GBR 2022-12-29 216005 5.56 1072

The number of observation per country:

deaths_df |> group_by(country) |> count()
# A tibble: 10 × 2
# Groups:   country [10]
   country            n
   <chr>          <int>
 1 Belgium         1096
 2 Denmark         1096
 3 France          1096
 4 Germany         1096
 5 Ireland         1096
 6 Italy           1096
 7 Netherlands     1096
 8 Spain           1096
 9 Sweden          1096
10 United Kingdom  1096

Let us keep in a table the correspondence between the country name and its ISO 3166-1 alpha-3 code:

country_codes <-
  confirmed_df |>
  select(country, country_code) |>
  unique()
country_codes
# A tibble: 10 × 2
   country        country_code
   <chr>          <chr>       
 1 Belgium        BEL         
 2 Germany        DEU         
 3 Denmark        DNK         
 4 Spain          ESP         
 5 France         FRA         
 6 United Kingdom GBR         
 7 Ireland        IRL         
 8 Italy          ITA         
 9 Netherlands    NLD         
10 Sweden         SWE         

4.1.2 Population

Let us define rough values for the population of each country:

population <- 
  tribble(
    ~country, ~pop,
    "France", 70e6,
    "Germany", 83e6,
    "Italy", 61e6,
    "Spain", 47e6,
    "United Kingdom", 67e6,
    "Belgium", 12e6,
    "Denmark", 6e6,
    "Ireland", 5e6,
    "Netherlands", 18e6,
    "Sweden", 11e6
  )

4.1.3 Filtering data

In this notebook, let us use data up to September 30th, 2020. Let us keep only those data and extend the date up to November 3rd, 2020 (to assess the goodness of fit of the models with unseen data).

end_date_sample <- lubridate::ymd("2020-09-30")
end_date_data <- lubridate::ymd("2020-10-31")
confirmed_df <- confirmed_df |> filter(date <= end_date_data)
deaths_df <- deaths_df |> filter(date <= end_date_data)

4.1.4 Defining start and end dates for a “first wave”

We create a table that gives some dates for each country, adopting the following convention:

  • start_first_wave: start of the first wave, defined as the first date when the cumulative number of cases is greater than 1
  • start_high_stringency: date at which the stringency index reaches its maximum value during the first 100 days of the sample
  • start_reduce_restrict: moment at which the restrictions of the first wave starts to lower
  • start_date_sample_second_wave: 60 days after the relaxation of restrictions (60 days after after start_reduce_restrict)
  • length_high_stringency: number of days between start_high_stringency and start_reduce_restrict`.
# Start of the outbreak
start_first_wave <- 
  confirmed_df |> 
  group_by(country) |> 
  arrange(date) |> 
  filter(value > 0) |> 
  slice(1) |> 
  select(country, start_first_wave = date)

The start of period with highest severity index among the first 100 days:

start_high_stringency <- 
  confirmed_df |> 
  group_by(country) |> 
  slice(1:100) |>
  arrange(desc(stringency_index), date) |> 
  slice(1) |> 
  select(country, start_high_stringency = date)

The moment at which the restrictions of the first wave starts to lower:

start_reduce_restrict <- 
  confirmed_df |> 
  group_by(country) |> 
  arrange(date) |> 
  left_join(start_high_stringency, by = "country") |> 
  filter(date >= start_high_stringency) |> 
  mutate(tmp = dplyr::lag(stringency_index)) |> 
  mutate(same_strin = stringency_index == tmp) |> 
  mutate(same_strin = ifelse(row_number()==1, TRUE, same_strin)) |> 
  filter(same_strin == FALSE) |> 
  slice(1) |>
  select(country, start_reduce_restrict = date)

The assumed start of the second wave:

start_date_sample_second_wave <-
  start_reduce_restrict |> 
  mutate(
    start_date_sample_second_wave = start_reduce_restrict + 
      lubridate::ddays(60)
  ) |>
  select(country, start_date_sample_second_wave)

Then, we can put all these dates into a single table:

stringency_dates <- 
  start_first_wave |> 
  left_join(start_high_stringency, by = "country") |> 
  left_join(start_reduce_restrict, by = "country") |> 
  left_join(start_date_sample_second_wave, by = "country")

4.2 Individual Data

Li et al. (2020) estimated the distribution of the incubation period by fitting a lognormal distribution on exposure histories, leading to an estimated mean of 5.2 days, a 0.95 confidence interval of [4.1,7.0] and the 95th percentile of 12.5 days.

This corresponds to a lognormal density with parameters \(\mu = 1.43\) and \(\sigma = 0.67\):

func <- function(x){
  # For finding gamma parameters of the serial interval distribution Li etal 2020
  a = x[1]
  s = x[2]
  (7.5-a*s)^2+(3.4-sqrt(a)*s)^2
}
x = c(4,2)
optim(x,func)
$par
[1] 4.866009 1.541308

$value
[1] 9.522571e-10

$counts
function gradient 
      67       NA 

$convergence
[1] 0

$message
NULL

The mean serial interval is defined as the time between the onset of symptoms in a primary case and the onset of symptoms in secondary cases. Li et al. (2020) fitted a gamma distribution to data from cluster investigations. They found a mean time of 7.5 days (SD = 3.4) with a 95% confidence interval of [5.3,19].

The corresponding parameters for the Gamma distribution can be obtained as follows:

mean_si <- 7.5
std_si <- 3.4

shape <- (mean_si / std_si)^2
scale <- mean_si / shape
str_c("Shape = ", shape, ", Scale = ", scale)
[1] "Shape = 4.8659169550173, Scale = 1.54133333333333"

The quantile of order .999 for such parameters is equal to:

h <- ceiling(qgamma(p = .99, shape = shape, scale = scale))
h
[1] 18

4.3 Descriptive statistics

Let us assign a colour to each country within each group (large or small countries)

colours_lugami <- rep(
  c("#1F78B4", "#33A02C", "#E31A1C", "#FF7F00", "#6A3D9A"), 2)
colours_lugami_names <- colours_lugami
names(colours_lugami_names) <- names_countries
colour_table <- 
  tibble(
    colour = colours_lugami_names,
    country = names(colours_lugami_names)) |> 
  left_join(country_codes, by = c("country"))
colour_table
# A tibble: 10 × 3
   colour  country        country_code
   <chr>   <chr>          <chr>       
 1 #1F78B4 United Kingdom GBR         
 2 #33A02C Spain          ESP         
 3 #E31A1C Italy          ITA         
 4 #FF7F00 Germany        DEU         
 5 #6A3D9A France         FRA         
 6 #1F78B4 Sweden         SWE         
 7 #33A02C Belgium        BEL         
 8 #E31A1C Netherlands    NLD         
 9 #FF7F00 Ireland        IRL         
10 #6A3D9A Denmark        DNK         

Keep also a track of that in a vector (useful for graphs with {ggplot2}):

colour_countries <- colour_table$colour
names(colour_countries) <- colour_table$country_code
colour_countries
      GBR       ESP       ITA       DEU       FRA       SWE       BEL       NLD 
"#1F78B4" "#33A02C" "#E31A1C" "#FF7F00" "#6A3D9A" "#1F78B4" "#33A02C" "#E31A1C" 
      IRL       DNK 
"#FF7F00" "#6A3D9A" 

Let us create two figures that shows the evolution of the cumulative number of cases through time and the cumulative number of deaths through time, respectively.

To that end, we need to reshape the data. First, we need a table in which each row gives the value to be plotted for a given date, a given country and a given type of variable (confirmed cases or recovered).

df_plot_evolution_numbers <- 
  confirmed_df |> 
  mutate(type = "Cases") |> 
  bind_rows(
    deaths_df |> 
      mutate(type = "Deaths")
  ) |> 
  mutate(type = factor(type, levels = c("Cases", "Deaths", "Recovered")))
df_plot_evolution_numbers
# A tibble: 6,100 × 7
   country country_code date       value stringency_index days_since_2020_01_22
   <chr>   <chr>        <date>     <int>            <dbl>                 <dbl>
 1 Belgium BEL          2020-01-01    NA                0                   -21
 2 Belgium BEL          2020-01-02    NA                0                   -20
 3 Belgium BEL          2020-01-03    NA                0                   -19
 4 Belgium BEL          2020-01-04    NA                0                   -18
 5 Belgium BEL          2020-01-05    NA                0                   -17
 6 Belgium BEL          2020-01-06    NA                0                   -16
 7 Belgium BEL          2020-01-07    NA                0                   -15
 8 Belgium BEL          2020-01-08    NA                0                   -14
 9 Belgium BEL          2020-01-09    NA                0                   -13
10 Belgium BEL          2020-01-10    NA                0                   -12
# ℹ 6,090 more rows
# ℹ 1 more variable: type <fct>

From this table, we can only keep cases and deaths cumulative values, filter the observation to keep only those after January 21, 2020. Let us also add for each line, whether the country it corresponds to is small or large (we will create two panels in the graphs). We can also add the country code.

df_plot_evolution_numbers_dates <- 
  df_plot_evolution_numbers |> 
  filter(type %in% c("Cases", "Deaths")) |> 
  filter(date >= ymd("2020-01-21")) |> 
  mutate(
    country_type = ifelse(country %in% names_countries_large, yes = "L", "S"),
    country_type = factor(
      country_type, 
      levels = c("L", "S"),
      labels = c("Larger countries", "Smaller countries")
    )
  ) |> 
  left_join(country_codes, by = c("country", "country_code"))
df_plot_evolution_numbers_dates
# A tibble: 5,700 × 8
   country country_code date       value stringency_index days_since_2020_01_22
   <chr>   <chr>        <date>     <int>            <dbl>                 <dbl>
 1 Belgium BEL          2020-01-21    NA             0                       -1
 2 Belgium BEL          2020-01-22     0             0                        0
 3 Belgium BEL          2020-01-23     0             0                        1
 4 Belgium BEL          2020-01-24     0             0                        2
 5 Belgium BEL          2020-01-25     0             0                        3
 6 Belgium BEL          2020-01-26     0             0                        4
 7 Belgium BEL          2020-01-27     0             0                        5
 8 Belgium BEL          2020-01-28     0             5.56                     6
 9 Belgium BEL          2020-01-29     0             5.56                     7
10 Belgium BEL          2020-01-30     0             5.56                     8
# ℹ 5,690 more rows
# ℹ 2 more variables: type <fct>, country_type <fct>

We want the labels of the legends for the countries to be ordered in a specific way. To that end, let us create two variables that specify this order:

order_countries_L <- c("GBR", "ESP", "ITA", "DEU", "FRA")
order_countries_S <- c("SWE", "BEL", "NLD", "IRL", "DNK")

4.3.1 Confirmed cases

Let us focus on the confirmed cases here. We can create two tables: one for the large countries, and another one for small ones:

df_plot_evolution_numbers_dates_L <- 
  df_plot_evolution_numbers_dates |> 
  filter(country_type == "Larger countries") |> 
  filter(type == "Cases")

And for small countries:

df_plot_evolution_numbers_dates_S <- 
  df_plot_evolution_numbers_dates |> 
  filter(country_type == "Smaller countries") |> 
  filter(type == "Cases")

As we would like the graphs to display the day at which the stringency index reaches its max value for the first time, we need to extract the cumulative number of cases at the corresponding dates.

df_lockdown_plot <- 
  stringency_dates |> 
  rename(date = start_first_wave) |> 
  left_join(
    df_plot_evolution_numbers_dates |> 
      filter(type == "Cases"),
    by = c("country", "date")
  )
df_lockdown_plot
# A tibble: 10 × 11
# Groups:   country [10]
   country        date       start_high_stringency start_reduce_restrict
   <chr>          <date>     <date>                <date>               
 1 Belgium        2020-02-04 2020-03-20            2020-05-05           
 2 Denmark        2020-02-27 2020-03-18            2020-04-15           
 3 France         2020-01-24 2020-03-17            2020-05-11           
 4 Germany        2020-01-27 2020-03-22            2020-05-03           
 5 Ireland        2020-02-29 2020-04-06            2020-05-18           
 6 Italy          2020-01-31 2020-03-20            2020-04-10           
 7 Netherlands    2020-02-27 2020-03-23            2020-05-11           
 8 Spain          2020-02-01 2020-03-30            2020-05-04           
 9 Sweden         2020-02-01 2020-04-01            2020-06-13           
10 United Kingdom 2020-01-31 2020-03-24            2020-05-11           
# ℹ 7 more variables: start_date_sample_second_wave <date>, country_code <chr>,
#   value <int>, stringency_index <dbl>, days_since_2020_01_22 <dbl>,
#   type <fct>, country_type <fct>

The two plots can be created:

plot_evolution_numbers_dates_L <- 
  ggplot(
    data = df_plot_evolution_numbers_dates_L |> 
      mutate(
        country_code = factor(country_code, levels = order_countries_L)
      ),
    mapping = aes(x = date, y = value, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_point(
    data = df_lockdown_plot |>
      filter(
        country %in% unique(df_plot_evolution_numbers_dates_L$country)
      ),
    colour = "black", size = 4
  ) +
  geom_point(
    data = df_lockdown_plot |> 
      filter(country %in% unique(df_plot_evolution_numbers_dates_L$country)),
    mapping = aes(colour = country_code), size = 3, show.legend = F
  ) +
  scale_shape_discrete("Lockdown") +
  scale_fill_manual(NULL, values = colour_countries, guide = "none") +
  scale_colour_manual(NULL, values = colour_countries) +
  labs(x = NULL, y = NULL) +
  scale_y_continuous(labels = comma) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_evolution_numbers_dates$date, n = 5)), 
    date_labels = "%b %d %Y"
  ) +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

and for small countries:

plot_evolution_numbers_dates_S <- 
  ggplot(
    data = df_plot_evolution_numbers_dates_S |> 
           mutate(
             country_code = factor(country_code, levels = order_countries_S)
             ),
    mapping = aes(x = date, y = value, colour = country_code)
    ) +
  geom_line(linewidth = 1.1) +
  geom_point(
    data = df_lockdown_plot |>
      filter(country %in% unique(df_plot_evolution_numbers_dates_S$country)),
    colour = "black", size = 4
  ) +
  geom_point(
    data = df_lockdown_plot |> 
      filter(country %in% unique(df_plot_evolution_numbers_dates_S$country)),
    mapping = aes(colour = country_code), size = 3, show.legend = F
  ) +
  scale_shape_discrete("Lock-down") +
  scale_fill_manual(NULL, values = colour_countries, guide = "none") +
  scale_colour_manual(NULL, values = colour_countries) +
  labs(x = NULL, y = NULL) +
  scale_y_continuous(labels = comma) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_evolution_numbers_dates$date, n = 5)), 
    date_labels = "%b %d %Y") +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

These two plots can be put together on a single one:

p <-
  arrangeGrob(
    # Row 1
    plot_evolution_numbers_dates_L + 
      labs(y = NULL, title = "(a) Confirmed case, large countries"),
    plot_evolution_numbers_dates_S + 
      labs(y = NULL, title = "(b) Confirmed cases, small countries."),
    nrow = 1
  ) |> 
  as_ggplot()
Warning: Removed 5 rows containing missing values (`geom_line()`).
Removed 5 rows containing missing values (`geom_line()`).
p

Figure 4.1: Confirmed cases between the beginning of the Covid-19 epidemics and November 2020.

4.3.2 Deaths

Let us focus on the number of deaths here. We can create two tables: one for the large countries, and another one for small ones:

df_plot_evolution_numbers_dates_L <- 
  df_plot_evolution_numbers_dates |> 
  filter(country_type == "Larger countries") |> 
  filter(type == "Deaths")

And for small countries:

df_plot_evolution_numbers_dates_S <- 
  df_plot_evolution_numbers_dates |> 
  filter(country_type == "Smaller countries") |> 
  filter(type == "Deaths")

As we would like the graphs to display the day at which the stringency index becomes greater or equal to 60 for the first time, we need to extract the cumulative number of cases at the corresponding dates.

df_lockdown_plot <- 
  stringency_dates |> 
  rename(date = start_first_wave) |> 
  left_join(
    df_plot_evolution_numbers_dates |> 
      filter(type == "Deaths"),
    by = c("country", "date")
  )
df_lockdown_plot
# A tibble: 10 × 11
# Groups:   country [10]
   country        date       start_high_stringency start_reduce_restrict
   <chr>          <date>     <date>                <date>               
 1 Belgium        2020-02-04 2020-03-20            2020-05-05           
 2 Denmark        2020-02-27 2020-03-18            2020-04-15           
 3 France         2020-01-24 2020-03-17            2020-05-11           
 4 Germany        2020-01-27 2020-03-22            2020-05-03           
 5 Ireland        2020-02-29 2020-04-06            2020-05-18           
 6 Italy          2020-01-31 2020-03-20            2020-04-10           
 7 Netherlands    2020-02-27 2020-03-23            2020-05-11           
 8 Spain          2020-02-01 2020-03-30            2020-05-04           
 9 Sweden         2020-02-01 2020-04-01            2020-06-13           
10 United Kingdom 2020-01-31 2020-03-24            2020-05-11           
# ℹ 7 more variables: start_date_sample_second_wave <date>, country_code <chr>,
#   value <int>, stringency_index <dbl>, days_since_2020_01_22 <dbl>,
#   type <fct>, country_type <fct>

The two plots can be created:

plot_evolution_numbers_dates_L <- 
  ggplot(
    data = df_plot_evolution_numbers_dates_L |> 
      mutate(
        country_code = factor(country_code, levels = order_countries_L)
      ),
    mapping = aes(x = date, y = value, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_point(
    data = df_lockdown_plot |>
      filter(country %in% unique(df_plot_evolution_numbers_dates_L$country)),
    colour = "black", size = 4
  ) +
  geom_point(
    data = df_lockdown_plot |> 
      filter(country %in% unique(df_plot_evolution_numbers_dates_L$country)),
    mapping = aes(colour = country_code), size = 3, show.legend = F) +
  labs(x = NULL, y = NULL) +
  scale_colour_manual(NULL, values = colour_countries) +
  scale_y_continuous(labels = comma) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_evolution_numbers_dates$date, n = 5)), 
    date_labels = "%b %d %Y") +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

and for small countries:

plot_evolution_numbers_dates_S <- 
  ggplot(
    data = df_plot_evolution_numbers_dates_S |> 
      mutate(country_code = factor(country_code, levels = order_countries_S)),
    mapping = aes(x = date, y = value, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_point(
    data = df_lockdown_plot |>
      filter(country %in% unique(df_plot_evolution_numbers_dates_S$country)),
    colour = "black", size = 4) +
  geom_point(
    data = df_lockdown_plot |> 
      filter(country %in% unique(df_plot_evolution_numbers_dates_S$country)),
    mapping = aes(colour = country_code), size = 3, show.legend = F
  ) +
  scale_shape_discrete("Lock-down") +
  scale_fill_manual(NULL, values = colour_countries, guide = "none") +
  scale_colour_manual(NULL, values = colour_countries) +
  labs(x = NULL, y = NULL) +
  scale_y_continuous(labels = comma) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_evolution_numbers_dates$date, n = 5)), 
    date_labels = "%b %d %Y"
  ) +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

These two plots can be put together on a single one:

p <-
  arrangeGrob(
    # Row 1
    plot_evolution_numbers_dates_L + 
      labs(y = NULL, title = "(a) Confirmed deaths, large countries"),
    plot_evolution_numbers_dates_S + 
      labs(y = NULL, title = "(b) Confirmed deaths, small countries."),
    nrow = 1
    ) |> 
  as_ggplot()
Warning: Removed 5 rows containing missing values (`geom_line()`).
Removed 5 rows containing missing values (`geom_line()`).
p

Figure 4.2: Confirmed deaths between the beginning of the Covid-19 epidemics and November 2020.

4.3.3 Stringency Index

Let us create a similar plot for the stringency index.

df_plot_stringency_index <- 
  confirmed_df |> 
  select(country, country_code, date, stringency_index) |> 
  mutate(
    country_type = ifelse(
      country %in% c("France", "Germany", "Italy",
                     "Spain", "United Kingdom"),
      yes = "L", "S")
  ) %>%
  mutate(
    country_type = factor(
      country_type, levels = c("L", "S"),
      labels = c("Larger countries", "Smaller countries")
    )
  )

Then we can create the plot for large countries:

plot_stringency_index_L <- 
  ggplot(
    data = df_plot_stringency_index |> 
      filter(country_type == "Larger countries") |> 
      mutate(country_code = fct_relevel(country_code, order_countries_L)),
    mapping =  aes(x = date, y = stringency_index, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_hline(yintercept = 70, linetype = "dotted") +
  labs(x = NULL, y = NULL) +
  scale_colour_manual(NULL, values = colour_countries) +
  scale_y_continuous(breaks = seq(0, 100, by = 20)) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_stringency_index$date, n = 7)), 
    date_labels = "%b %d %Y"
  ) +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

And for small countries:

plot_stringency_index_S <- 
  ggplot(
    data = df_plot_stringency_index |> 
      filter(country_type == "Smaller countries") |> 
      mutate(country_code = fct_relevel(country_code, order_countries_S)),
    mapping= aes(x = date, y = stringency_index, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_hline(yintercept = 70, linetype = "dotted") +
  labs(x = NULL, y = NULL) +
  scale_colour_manual(NULL, values = colour_countries) +
  scale_y_continuous(breaks = seq(0, 100, by = 20)) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_stringency_index$date, n = 7)), 
    date_labels = "%b %d %Y") +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

Lastly, we can plot these two graphs on a single figure:

p <-
  arrangeGrob(
    # Row 1
    plot_stringency_index_L + 
      labs(y = NULL, title = "(a) Severity index, large countries"),
    plot_stringency_index_S + 
      labs(y = NULL, title = "(b) Severity index, small countries"),
    nrow = 1
  ) |> 
  as_ggplot()
p

Figure 4.3: Severity index between the beginning of the Covid-19 epidemics and November 2020.

4.3.4 Speed of reaction to the epidemic outbreak

The observation of a first case was the sign that the epidemic had reached the country. What was the delay between this first case and a significant reaction identified when the index was greater than 20?

We can first extract the date on which the severity index reaches the value of 20 for the first time as follows:

start_stringency_20 <- 
  confirmed_df |>
  filter(stringency_index >= 20) |>
  group_by(country) |>
  slice(1) |>
  select(country, date_stringency_20 = date, cases = value) |> 
  ungroup()

Then, we can get the date of the first case for each country:

start_first_case <- 
  df_plot_evolution_numbers_dates |> 
  filter(type == "Cases") |> 
  group_by(country) |> 
  filter(value > 0) |> 
  arrange(date) |> 
  slice(1) |> 
  select(country, first_case = date) |> 
  ungroup()
start_stringency_20 |> 
  left_join(start_first_case) |> 
  mutate(interval = lubridate::interval(first_case, date_stringency_20),
         delay = interval / lubridate::ddays(1)) |> 
  mutate(country = fct_relevel(country, names_countries)) |> 
  arrange(country) |> 
  select(country, first_case, delay, cases) |>
  kableExtra::kable()
Joining with `by = join_by(country)`
Table 4.3: Speed of reaction to the epidemic outbreak
country first_case delay cases
United Kingdom 2020-01-31 46 4452
Spain 2020-02-01 37 1073
Italy 2020-01-31 21 20
Germany 2020-01-27 33 66
France 2020-01-24 36 100
Sweden 2020-02-01 40 771
Belgium 2020-02-04 38 559
Netherlands 2020-02-27 12 503
Ireland 2020-02-29 12 43
Denmark 2020-02-27 5 6

4.3.5 Confinement and deconfinement policies

Let us check how the different countries proceeded with deconfinement.

The date at which the stringency index reached its maximum value within the first 100 days since the end of January:

start_max_stringency <- 
  confirmed_df |> 
  group_by(country) |> 
  slice(1:100) |>
  arrange(desc(stringency_index), date) |> 
  slice(1) |> 
  select(country, start = date)
start_max_stringency
# A tibble: 10 × 2
# Groups:   country [10]
   country        start     
   <chr>          <date>    
 1 Belgium        2020-03-20
 2 Denmark        2020-03-18
 3 France         2020-03-17
 4 Germany        2020-03-22
 5 Ireland        2020-04-06
 6 Italy          2020-03-20
 7 Netherlands    2020-03-23
 8 Spain          2020-03-30
 9 Sweden         2020-04-01
10 United Kingdom 2020-03-24

We can easily obtain the date on which the index begins to fall from its maximum value, corresponding to a relaxation of policy measures (i.e., end of lockdown):

policies <- 
  confirmed_df |> 
  select(country, date, stringency_index) |> 
  left_join(start_max_stringency, by = c("country")) |> 
  group_by(country) |> 
  arrange(date) |> 
  filter(date >= start) |> 
  mutate(tmp = dplyr::lag(stringency_index)) |>
  mutate(same_strin = stringency_index == tmp) |>
  mutate(same_strin = ifelse(row_number()==1, TRUE, same_strin)) |>
  filter(same_strin == FALSE) |>
  slice(1) |> 
  mutate(length = lubridate::interval(start, date) / ddays(1)) |> 
  select(country, start, end=date, length)
policies
# A tibble: 10 × 4
# Groups:   country [10]
   country        start      end        length
   <chr>          <date>     <date>      <dbl>
 1 Belgium        2020-03-20 2020-05-05     46
 2 Denmark        2020-03-18 2020-04-15     28
 3 France         2020-03-17 2020-05-11     55
 4 Germany        2020-03-22 2020-05-03     42
 5 Ireland        2020-04-06 2020-05-18     42
 6 Italy          2020-03-20 2020-04-10     21
 7 Netherlands    2020-03-23 2020-05-11     49
 8 Spain          2020-03-30 2020-05-04     35
 9 Sweden         2020-04-01 2020-06-13     73
10 United Kingdom 2020-03-24 2020-05-11     48

The average of the strigency index between the max value and the end of containment can be obtained as follows:

strength_containment <- 
  confirmed_df |> 
  select(country, date, stringency_index) |> 
  left_join(policies, by = "country") |> 
  filter(date >= start, date < end) |> 
  group_by(country) |> 
  summarise(strength = mean(stringency_index))
strength_containment
# A tibble: 10 × 2
   country        strength
   <chr>             <dbl>
 1 Belgium            81.5
 2 Denmark            72.2
 3 France             88.0
 4 Germany            76.8
 5 Ireland            90.7
 6 Italy              85.2
 7 Netherlands        78.7
 8 Spain              85.2
 9 Sweden             64.8
10 United Kingdom     79.6

We can count how many smoothing and how many restrengthening actions were made till the end of our sample:

policy_changes <- 
  confirmed_df |> 
  select(country, date, stringency_index) |> 
  left_join(policies, by = "country") |> 
  filter(date >=  end, date <= end_date_sample) |> 
  group_by(country) |> 
  mutate(tmp = dplyr::lag(stringency_index)) |> 
  mutate(tmp = ifelse(row_number()==1, stringency_index, tmp)) |> 
  mutate(same_strin = stringency_index == tmp) |>
  mutate(
    smoothing = ifelse(!same_strin & stringency_index < tmp, TRUE, FALSE),
    restrenghtening = ifelse(
      !same_strin & stringency_index > tmp, TRUE, FALSE)
  ) |> 
  summarise(
    changes_smo = sum(smoothing),
    changes_res = sum(restrenghtening)
  )
policy_changes
# A tibble: 10 × 3
   country        changes_smo changes_res
   <chr>                <int>       <int>
 1 Belgium                  8           3
 2 Denmark                  5           1
 3 France                   6           3
 4 Germany                  9           4
 5 Ireland                  4           2
 6 Italy                    5           4
 7 Netherlands              4           2
 8 Spain                    9           4
 9 Sweden                   1           0
10 United Kingdom           8           4

Lastly, we can gather all this information in a single table:

policies |> 
  left_join(strength_containment, by = "country") |> 
  left_join(policy_changes, by = "country") |> 
  mutate(country = factor(country)) |> 
  mutate(country = fct_relevel(country, names_countries)) |> 
  arrange(country) |> 
  mutate(
    start = format(start, "%B %d"),
    end = format(end, "%B %d")
  ) |> 
  kableExtra::kable()

?(caption)

country
start
end
length
strength
changes_smo
changes_res
United Kingdom
March 24
May 11
48
79.63
8
4
Spain
March 30
May 04
35
85.19
9
4
Italy
March 20
April 10
21
85.19
5
4
Germany
March 22
May 03
42
76.85
9
4
France
March 17
May 11
55
87.96
6
3
Sweden
April 01
June 13
73
64.81
1
0
Belgium
March 20
May 05
46
81.48
8
3
Netherlands
March 23
May 11
49
78.70
4
2
Ireland
April 06
May 18
42
90.74
4
2
Denmark
March 18
April 15
28
72.22
5
1

Overview of the smoothing and restrenghthening actions during between the beginning of the Codiv-19 epidemic and November 2020.

4.4 Saving the results

Let us save the following R objects for later use.

save(
  confirmed_df,
  deaths_df,
  population,
  h,
  stringency_dates,
  names_countries,
  names_countries_large,
  names_countries_small,
  order_countries_L,
  order_countries_S,
  colour_countries,
  colour_table,
  country_codes,
  theme_paper,
  file = "data/data_after_load.rda"
  )