4 Covid-19 Data

This chapter provides codes that can be used to work with the data from Oxford University: Oxford Covid-19 Government Response Tracker (OxCGRT).

Here, we will focus on 5 large and 5 small European countries, in terms of inhabitants: - “Large countries”: United Kingdom, Spain, Italy, Germany, France, - “Small countries”: Sweden, Belgium, Netherlands, Ireland, Denmark.

4.1 Load data

First of all, let us load some packages.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lubridate)
library(knitr)
library(kableExtra)


Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows

library(RColorBrewer)
library(growthmodels)
library(minpack.lm)
library(scales)


Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor

library(nlstools)


'nlstools' has been loaded.

IMPORTANT NOTICE: Most nonlinear regression models and data set examples
related to predictive microbiolgy have been moved to the package 'nlsMicrobio'

library(ggpubr)
library(gridExtra)


Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

Let us also define a theme for the graphical outputs, as in Chapter 1

library(grid)
theme_paper <- function(..., size_text = 8)
  theme(text = element_text(size = size_text),
        plot.background = element_rect(fill="transparent", color=NA),
        panel.background = element_rect(fill = "transparent", color=NA),
        panel.border = element_blank(),
        axis.text = element_text(), 
        legend.text = element_text(size = rel(1.1)),
        legend.title = element_text(size = rel(1.1)),
        legend.background = element_rect(fill="transparent", color=NULL),
        legend.position = "bottom", 
        legend.direction = "horizontal", legend.box = "vertical",
        legend.key = element_blank(),
        panel.spacing = unit(1, "lines"),
        panel.grid.major = element_line(colour = "grey90"), 
        panel.grid.minor = element_blank(),
        plot.title = element_text(hjust = 0, size = rel(1.3), face = "bold"),
        plot.title.position = "plot",
        plot.margin = unit(c(1, 1, 1, 1), "lines"),
        strip.background = element_rect(fill=NA, colour = NA),
        strip.text = element_text(size = rel(1.1)))

Let us modify the locale settings. This depends on the OS. For Windows users:

Sys.setlocale("LC_ALL", "English_United States")

For Unix users:

# Only for Unix users
Sys.setlocale("LC_ALL", "en_US.UTF-8")

[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"

4.1.1 Confirmed and Deaths data

As mentioned at the beginning of the notebook, we rely on Oxford Covid-19 Government Response Tracker (OxCGRT) data.

Let us define the vector of country names:

names_countries <- c("United Kingdom", "Spain", "Italy", "Germany", "France", 
                     "Sweden", "Belgium", "Netherlands", "Ireland", "Denmark")
names_countries_large <- c("United Kingdom", "Spain", "Italy", "Germany", "France")
names_countries_small <- c("Sweden", "Belgium", "Netherlands", "Ireland", "Denmark")

The raw data can be downloaded as follows:

df_oxford <- 
  read.csv("https://raw.githubusercontent.com/OxCGRT/covid-policy-tracker/master/data/OxCGRT_nat_latest.csv")

Then we save those:

dir.create("data")
save(df_oxford, file = "data/df_oxford.rda")

Let us load the saved data (data saved on May 29, 2023):

The dimensions are the following:

dim(df_oxford)

[1] 202819     61

Let us focus first on the number of confirmed cases:

confirmed_df <- 
  df_oxford |> 
  select(
    country = CountryName, 
    country_code = CountryCode,
    date = Date,
    value = ConfirmedCases,
    stringency_index = StringencyIndex_Average) |> 
  filter(country %in% names_countries) |> 
  as_tibble() |> 
  mutate(
    date = ymd(date),
    days_since_2020_01_22 =
      lubridate::interval(
        lubridate::ymd("2020-01-22"), date) / lubridate::ddays(1)
  )

We can do the same for the number of deaths:

deaths_df <- 
  df_oxford |> 
  select(country = CountryName, 
         country_code = CountryCode,
         date = Date,
         value = ConfirmedDeaths, 
         stringency_index = StringencyIndex_Average) |> 
  filter(country %in% names_countries) |> 
  as_tibble() |> 
  mutate(
    date = ymd(date),
    days_since_2020_01_22 =
      lubridate::interval(
        lubridate::ymd("2020-01-22"), date) / lubridate::ddays(1)
  )

4.1.1.1 Confirmed extract

Here is an extract from the last 3 rows of the confirmed cases:

confirmed_df |> 
  group_by(country) |> 
  slice_tail(n = 3) |> 
  arrange(country, desc(date)) |> 
  kable()

Table 4.1: Last 3 rows of confirmed cases, by country
country	country_code	date	value	stringency_index	days_since_2020_01_22
Belgium	BEL	2022-12-31	4668248	11.11	1074
Belgium	BEL	2022-12-30	4668248	11.11	1073
Belgium	BEL	2022-12-29	4668248	11.11	1072
Denmark	DNK	2022-12-31	3385836	11.11	1074
Denmark	DNK	2022-12-30	3385836	11.11	1073
Denmark	DNK	2022-12-29	3385836	11.11	1072
France	FRA	2022-12-31	38266999	11.11	1074
France	FRA	2022-12-30	38266999	11.11	1073
France	FRA	2022-12-29	38243191	11.11	1072
Germany	DEU	2022-12-31	37369866	11.11	1074
Germany	DEU	2022-12-30	37369865	11.11	1073
Germany	DEU	2022-12-29	37345969	11.11	1072
Ireland	IRL	2022-12-31	1687668	5.56	1074
Ireland	IRL	2022-12-30	1687668	5.56	1073
Ireland	IRL	2022-12-29	1687668	5.56	1072
Italy	ITA	2022-12-31	25143705	21.99	1074
Italy	ITA	2022-12-30	25143705	21.99	1073
Italy	ITA	2022-12-29	25021606	21.99	1072
Netherlands	NLD	2022-12-31	8569095	11.11	1074
Netherlands	NLD	2022-12-30	8569095	11.11	1073
Netherlands	NLD	2022-12-29	8565716	11.11	1072
Spain	ESP	2022-12-31	13684258	5.56	1074
Spain	ESP	2022-12-30	13684258	5.56	1073
Spain	ESP	2022-12-29	13670037	5.56	1072
Sweden	SWE	2022-12-31	2674862	11.11	1074
Sweden	SWE	2022-12-30	2674862	11.11	1073
Sweden	SWE	2022-12-29	2674862	11.11	1072
United Kingdom	GBR	2022-12-31	24135080	5.56	1074
United Kingdom	GBR	2022-12-30	24135080	5.56	1073
United Kingdom	GBR	2022-12-29	24135080	5.56	1072

The number of observation per country:

confirmed_df |> group_by(country) |> count()

# A tibble: 10 × 2
# Groups:   country [10]
   country            n
   <chr>          <int>
 1 Belgium         1096
 2 Denmark         1096
 3 France          1096
 4 Germany         1096
 5 Ireland         1096
 6 Italy           1096
 7 Netherlands     1096
 8 Spain           1096
 9 Sweden          1096
10 United Kingdom  1096

4.1.1.2 Deaths extract

Here is an extract from the last 3 rows of the number of deaths, for each country:

deaths_df |> 
  group_by(country) |> 
  slice_tail(n = 3) |> 
  arrange(country, desc(date)) |> 
  kable()

Table 4.2: Last 3 rows of deaths, by country
country	country_code	date	value	stringency_index	days_since_2020_01_22
Belgium	BEL	2022-12-31	33228	11.11	1074
Belgium	BEL	2022-12-30	33228	11.11	1073
Belgium	BEL	2022-12-29	33228	11.11	1072
Denmark	DNK	2022-12-31	7758	11.11	1074
Denmark	DNK	2022-12-30	7758	11.11	1073
Denmark	DNK	2022-12-29	7758	11.11	1072
France	FRA	2022-12-31	158385	11.11	1074
France	FRA	2022-12-30	158385	11.11	1073
France	FRA	2022-12-29	158270	11.11	1072
Germany	DEU	2022-12-31	161465	11.11	1074
Germany	DEU	2022-12-30	161465	11.11	1073
Germany	DEU	2022-12-29	161321	11.11	1072
Ireland	IRL	2022-12-31	8293	5.56	1074
Ireland	IRL	2022-12-30	8293	5.56	1073
Ireland	IRL	2022-12-29	8293	5.56	1072
Italy	ITA	2022-12-31	184642	21.99	1074
Italy	ITA	2022-12-30	184642	21.99	1073
Italy	ITA	2022-12-29	183936	21.99	1072
Netherlands	NLD	2022-12-31	22990	11.11	1074
Netherlands	NLD	2022-12-30	22990	11.11	1073
Netherlands	NLD	2022-12-29	22971	11.11	1072
Spain	ESP	2022-12-31	117095	5.56	1074
Spain	ESP	2022-12-30	117095	5.56	1073
Spain	ESP	2022-12-29	116899	5.56	1072
Sweden	SWE	2022-12-31	21827	11.11	1074
Sweden	SWE	2022-12-30	21827	11.11	1073
Sweden	SWE	2022-12-29	21827	11.11	1072
United Kingdom	GBR	2022-12-31	216306	5.56	1074
United Kingdom	GBR	2022-12-30	216155	5.56	1073
United Kingdom	GBR	2022-12-29	216005	5.56	1072

The number of observation per country:

deaths_df |> group_by(country) |> count()

# A tibble: 10 × 2
# Groups:   country [10]
   country            n
   <chr>          <int>
 1 Belgium         1096
 2 Denmark         1096
 3 France          1096
 4 Germany         1096
 5 Ireland         1096
 6 Italy           1096
 7 Netherlands     1096
 8 Spain           1096
 9 Sweden          1096
10 United Kingdom  1096

Let us keep in a table the correspondence between the country name and its ISO 3166-1 alpha-3 code:

country_codes <-
  confirmed_df |>
  select(country, country_code) |>
  unique()
country_codes

# A tibble: 10 × 2
   country        country_code
   <chr>          <chr>       
 1 Belgium        BEL         
 2 Germany        DEU         
 3 Denmark        DNK         
 4 Spain          ESP         
 5 France         FRA         
 6 United Kingdom GBR         
 7 Ireland        IRL         
 8 Italy          ITA         
 9 Netherlands    NLD         
10 Sweden         SWE

4.1.2 Population

Let us define rough values for the population of each country:

population <- 
  tribble(
    ~country, ~pop,
    "France", 70e6,
    "Germany", 83e6,
    "Italy", 61e6,
    "Spain", 47e6,
    "United Kingdom", 67e6,
    "Belgium", 12e6,
    "Denmark", 6e6,
    "Ireland", 5e6,
    "Netherlands", 18e6,
    "Sweden", 11e6
  )

4.1.3 Filtering data

In this notebook, let us use data up to September 30th, 2020. Let us keep only those data and extend the date up to November 3rd, 2020 (to assess the goodness of fit of the models with unseen data).

end_date_sample <- lubridate::ymd("2020-09-30")
end_date_data <- lubridate::ymd("2020-10-31")
confirmed_df <- confirmed_df |> filter(date <= end_date_data)
deaths_df <- deaths_df |> filter(date <= end_date_data)

4.1.4 Defining start and end dates for a “first wave”

We create a table that gives some dates for each country, adopting the following convention:

start_first_wave: start of the first wave, defined as the first date when the cumulative number of cases is greater than 1
start_high_stringency: date at which the stringency index reaches its maximum value during the first 100 days of the sample
start_reduce_restrict: moment at which the restrictions of the first wave starts to lower
start_date_sample_second_wave: 60 days after the relaxation of restrictions (60 days after after start_reduce_restrict)
length_high_stringency: number of days between start_high_stringency and start_reduce_restrict`.

# Start of the outbreak
start_first_wave <- 
  confirmed_df |> 
  group_by(country) |> 
  arrange(date) |> 
  filter(value > 0) |> 
  slice(1) |> 
  select(country, start_first_wave = date)

The start of period with highest severity index among the first 100 days:

start_high_stringency <- 
  confirmed_df |> 
  group_by(country) |> 
  slice(1:100) |>
  arrange(desc(stringency_index), date) |> 
  slice(1) |> 
  select(country, start_high_stringency = date)

The moment at which the restrictions of the first wave starts to lower:

start_reduce_restrict <- 
  confirmed_df |> 
  group_by(country) |> 
  arrange(date) |> 
  left_join(start_high_stringency, by = "country") |> 
  filter(date >= start_high_stringency) |> 
  mutate(tmp = dplyr::lag(stringency_index)) |> 
  mutate(same_strin = stringency_index == tmp) |> 
  mutate(same_strin = ifelse(row_number()==1, TRUE, same_strin)) |> 
  filter(same_strin == FALSE) |> 
  slice(1) |>
  select(country, start_reduce_restrict = date)

The assumed start of the second wave:

start_date_sample_second_wave <-
  start_reduce_restrict |> 
  mutate(
    start_date_sample_second_wave = start_reduce_restrict + 
      lubridate::ddays(60)
  ) |>
  select(country, start_date_sample_second_wave)

Then, we can put all these dates into a single table:

stringency_dates <- 
  start_first_wave |> 
  left_join(start_high_stringency, by = "country") |> 
  left_join(start_reduce_restrict, by = "country") |> 
  left_join(start_date_sample_second_wave, by = "country")

4.2 Individual Data

Li et al. (2020) estimated the distribution of the incubation period by fitting a lognormal distribution on exposure histories, leading to an estimated mean of 5.2 days, a 0.95 confidence interval of [4.1,7.0] and the 95th percentile of 12.5 days.

This corresponds to a lognormal density with parameters \(\mu = 1.43\) and \(\sigma = 0.67\):

func <- function(x){
  # For finding gamma parameters of the serial interval distribution Li etal 2020
  a = x[1]
  s = x[2]
  (7.5-a*s)^2+(3.4-sqrt(a)*s)^2
}
x = c(4,2)
optim(x,func)

$par
[1] 4.866009 1.541308

$value
[1] 9.522571e-10

$counts
function gradient 
      67       NA 

$convergence
[1] 0

$message
NULL

The mean serial interval is defined as the time between the onset of symptoms in a primary case and the onset of symptoms in secondary cases. Li et al. (2020) fitted a gamma distribution to data from cluster investigations. They found a mean time of 7.5 days (SD = 3.4) with a 95% confidence interval of [5.3,19].

The corresponding parameters for the Gamma distribution can be obtained as follows:

mean_si <- 7.5
std_si <- 3.4

shape <- (mean_si / std_si)^2
scale <- mean_si / shape
str_c("Shape = ", shape, ", Scale = ", scale)

[1] "Shape = 4.8659169550173, Scale = 1.54133333333333"

The quantile of order .999 for such parameters is equal to:

h <- ceiling(qgamma(p = .99, shape = shape, scale = scale))
h

[1] 18

4.3 Descriptive statistics

Let us assign a colour to each country within each group (large or small countries)

colours_lugami <- rep(
  c("#1F78B4", "#33A02C", "#E31A1C", "#FF7F00", "#6A3D9A"), 2)
colours_lugami_names <- colours_lugami
names(colours_lugami_names) <- names_countries
colour_table <- 
  tibble(
    colour = colours_lugami_names,
    country = names(colours_lugami_names)) |> 
  left_join(country_codes, by = c("country"))
colour_table

# A tibble: 10 × 3
   colour  country        country_code
   <chr>   <chr>          <chr>       
 1 #1F78B4 United Kingdom GBR         
 2 #33A02C Spain          ESP         
 3 #E31A1C Italy          ITA         
 4 #FF7F00 Germany        DEU         
 5 #6A3D9A France         FRA         
 6 #1F78B4 Sweden         SWE         
 7 #33A02C Belgium        BEL         
 8 #E31A1C Netherlands    NLD         
 9 #FF7F00 Ireland        IRL         
10 #6A3D9A Denmark        DNK

Keep also a track of that in a vector (useful for graphs with {ggplot2}):

colour_countries <- colour_table$colour
names(colour_countries) <- colour_table$country_code
colour_countries

      GBR       ESP       ITA       DEU       FRA       SWE       BEL       NLD 
"#1F78B4" "#33A02C" "#E31A1C" "#FF7F00" "#6A3D9A" "#1F78B4" "#33A02C" "#E31A1C" 
      IRL       DNK 
"#FF7F00" "#6A3D9A"

Let us create two figures that shows the evolution of the cumulative number of cases through time and the cumulative number of deaths through time, respectively.

To that end, we need to reshape the data. First, we need a table in which each row gives the value to be plotted for a given date, a given country and a given type of variable (confirmed cases or recovered).

df_plot_evolution_numbers <- 
  confirmed_df |> 
  mutate(type = "Cases") |> 
  bind_rows(
    deaths_df |> 
      mutate(type = "Deaths")
  ) |> 
  mutate(type = factor(type, levels = c("Cases", "Deaths", "Recovered")))
df_plot_evolution_numbers

# A tibble: 6,100 × 7
   country country_code date       value stringency_index days_since_2020_01_22
   <chr>   <chr>        <date>     <int>            <dbl>                 <dbl>
 1 Belgium BEL          2020-01-01    NA                0                   -21
 2 Belgium BEL          2020-01-02    NA                0                   -20
 3 Belgium BEL          2020-01-03    NA                0                   -19
 4 Belgium BEL          2020-01-04    NA                0                   -18
 5 Belgium BEL          2020-01-05    NA                0                   -17
 6 Belgium BEL          2020-01-06    NA                0                   -16
 7 Belgium BEL          2020-01-07    NA                0                   -15
 8 Belgium BEL          2020-01-08    NA                0                   -14
 9 Belgium BEL          2020-01-09    NA                0                   -13
10 Belgium BEL          2020-01-10    NA                0                   -12
# ℹ 6,090 more rows
# ℹ 1 more variable: type <fct>

From this table, we can only keep cases and deaths cumulative values, filter the observation to keep only those after January 21, 2020. Let us also add for each line, whether the country it corresponds to is small or large (we will create two panels in the graphs). We can also add the country code.

df_plot_evolution_numbers_dates <- 
  df_plot_evolution_numbers |> 
  filter(type %in% c("Cases", "Deaths")) |> 
  filter(date >= ymd("2020-01-21")) |> 
  mutate(
    country_type = ifelse(country %in% names_countries_large, yes = "L", "S"),
    country_type = factor(
      country_type, 
      levels = c("L", "S"),
      labels = c("Larger countries", "Smaller countries")
    )
  ) |> 
  left_join(country_codes, by = c("country", "country_code"))
df_plot_evolution_numbers_dates

# A tibble: 5,700 × 8
   country country_code date       value stringency_index days_since_2020_01_22
   <chr>   <chr>        <date>     <int>            <dbl>                 <dbl>
 1 Belgium BEL          2020-01-21    NA             0                       -1
 2 Belgium BEL          2020-01-22     0             0                        0
 3 Belgium BEL          2020-01-23     0             0                        1
 4 Belgium BEL          2020-01-24     0             0                        2
 5 Belgium BEL          2020-01-25     0             0                        3
 6 Belgium BEL          2020-01-26     0             0                        4
 7 Belgium BEL          2020-01-27     0             0                        5
 8 Belgium BEL          2020-01-28     0             5.56                     6
 9 Belgium BEL          2020-01-29     0             5.56                     7
10 Belgium BEL          2020-01-30     0             5.56                     8
# ℹ 5,690 more rows
# ℹ 2 more variables: type <fct>, country_type <fct>

We want the labels of the legends for the countries to be ordered in a specific way. To that end, let us create two variables that specify this order:

order_countries_L <- c("GBR", "ESP", "ITA", "DEU", "FRA")
order_countries_S <- c("SWE", "BEL", "NLD", "IRL", "DNK")

4.3.1 Confirmed cases

Let us focus on the confirmed cases here. We can create two tables: one for the large countries, and another one for small ones:

df_plot_evolution_numbers_dates_L <- 
  df_plot_evolution_numbers_dates |> 
  filter(country_type == "Larger countries") |> 
  filter(type == "Cases")

And for small countries:

df_plot_evolution_numbers_dates_S <- 
  df_plot_evolution_numbers_dates |> 
  filter(country_type == "Smaller countries") |> 
  filter(type == "Cases")

As we would like the graphs to display the day at which the stringency index reaches its max value for the first time, we need to extract the cumulative number of cases at the corresponding dates.

df_lockdown_plot <- 
  stringency_dates |> 
  rename(date = start_first_wave) |> 
  left_join(
    df_plot_evolution_numbers_dates |> 
      filter(type == "Cases"),
    by = c("country", "date")
  )
df_lockdown_plot

# A tibble: 10 × 11
# Groups:   country [10]
   country        date       start_high_stringency start_reduce_restrict
   <chr>          <date>     <date>                <date>               
 1 Belgium        2020-02-04 2020-03-20            2020-05-05           
 2 Denmark        2020-02-27 2020-03-18            2020-04-15           
 3 France         2020-01-24 2020-03-17            2020-05-11           
 4 Germany        2020-01-27 2020-03-22            2020-05-03           
 5 Ireland        2020-02-29 2020-04-06            2020-05-18           
 6 Italy          2020-01-31 2020-03-20            2020-04-10           
 7 Netherlands    2020-02-27 2020-03-23            2020-05-11           
 8 Spain          2020-02-01 2020-03-30            2020-05-04           
 9 Sweden         2020-02-01 2020-04-01            2020-06-13           
10 United Kingdom 2020-01-31 2020-03-24            2020-05-11           
# ℹ 7 more variables: start_date_sample_second_wave <date>, country_code <chr>,
#   value <int>, stringency_index <dbl>, days_since_2020_01_22 <dbl>,
#   type <fct>, country_type <fct>

The two plots can be created:

plot_evolution_numbers_dates_L <- 
  ggplot(
    data = df_plot_evolution_numbers_dates_L |> 
      mutate(
        country_code = factor(country_code, levels = order_countries_L)
      ),
    mapping = aes(x = date, y = value, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_point(
    data = df_lockdown_plot |>
      filter(
        country %in% unique(df_plot_evolution_numbers_dates_L$country)
      ),
    colour = "black", size = 4
  ) +
  geom_point(
    data = df_lockdown_plot |> 
      filter(country %in% unique(df_plot_evolution_numbers_dates_L$country)),
    mapping = aes(colour = country_code), size = 3, show.legend = F
  ) +
  scale_shape_discrete("Lockdown") +
  scale_fill_manual(NULL, values = colour_countries, guide = "none") +
  scale_colour_manual(NULL, values = colour_countries) +
  labs(x = NULL, y = NULL) +
  scale_y_continuous(labels = comma) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_evolution_numbers_dates$date, n = 5)), 
    date_labels = "%b %d %Y"
  ) +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

and for small countries:

plot_evolution_numbers_dates_S <- 
  ggplot(
    data = df_plot_evolution_numbers_dates_S |> 
           mutate(
             country_code = factor(country_code, levels = order_countries_S)
             ),
    mapping = aes(x = date, y = value, colour = country_code)
    ) +
  geom_line(linewidth = 1.1) +
  geom_point(
    data = df_lockdown_plot |>
      filter(country %in% unique(df_plot_evolution_numbers_dates_S$country)),
    colour = "black", size = 4
  ) +
  geom_point(
    data = df_lockdown_plot |> 
      filter(country %in% unique(df_plot_evolution_numbers_dates_S$country)),
    mapping = aes(colour = country_code), size = 3, show.legend = F
  ) +
  scale_shape_discrete("Lock-down") +
  scale_fill_manual(NULL, values = colour_countries, guide = "none") +
  scale_colour_manual(NULL, values = colour_countries) +
  labs(x = NULL, y = NULL) +
  scale_y_continuous(labels = comma) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_evolution_numbers_dates$date, n = 5)), 
    date_labels = "%b %d %Y") +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

These two plots can be put together on a single one:

p <-
  arrangeGrob(
    # Row 1
    plot_evolution_numbers_dates_L + 
      labs(y = NULL, title = "(a) Confirmed case, large countries"),
    plot_evolution_numbers_dates_S + 
      labs(y = NULL, title = "(b) Confirmed cases, small countries."),
    nrow = 1
  ) |> 
  as_ggplot()

Warning: Removed 5 rows containing missing values (`geom_line()`).
Removed 5 rows containing missing values (`geom_line()`).

Figure 4.1: Confirmed cases between the beginning of the Covid-19 epidemics and November 2020.

4.3.2 Deaths

Let us focus on the number of deaths here. We can create two tables: one for the large countries, and another one for small ones:

df_plot_evolution_numbers_dates_L <- 
  df_plot_evolution_numbers_dates |> 
  filter(country_type == "Larger countries") |> 
  filter(type == "Deaths")

And for small countries:

df_plot_evolution_numbers_dates_S <- 
  df_plot_evolution_numbers_dates |> 
  filter(country_type == "Smaller countries") |> 
  filter(type == "Deaths")

As we would like the graphs to display the day at which the stringency index becomes greater or equal to 60 for the first time, we need to extract the cumulative number of cases at the corresponding dates.

df_lockdown_plot <- 
  stringency_dates |> 
  rename(date = start_first_wave) |> 
  left_join(
    df_plot_evolution_numbers_dates |> 
      filter(type == "Deaths"),
    by = c("country", "date")
  )
df_lockdown_plot

# A tibble: 10 × 11
# Groups:   country [10]
   country        date       start_high_stringency start_reduce_restrict
   <chr>          <date>     <date>                <date>               
 1 Belgium        2020-02-04 2020-03-20            2020-05-05           
 2 Denmark        2020-02-27 2020-03-18            2020-04-15           
 3 France         2020-01-24 2020-03-17            2020-05-11           
 4 Germany        2020-01-27 2020-03-22            2020-05-03           
 5 Ireland        2020-02-29 2020-04-06            2020-05-18           
 6 Italy          2020-01-31 2020-03-20            2020-04-10           
 7 Netherlands    2020-02-27 2020-03-23            2020-05-11           
 8 Spain          2020-02-01 2020-03-30            2020-05-04           
 9 Sweden         2020-02-01 2020-04-01            2020-06-13           
10 United Kingdom 2020-01-31 2020-03-24            2020-05-11           
# ℹ 7 more variables: start_date_sample_second_wave <date>, country_code <chr>,
#   value <int>, stringency_index <dbl>, days_since_2020_01_22 <dbl>,
#   type <fct>, country_type <fct>

The two plots can be created:

plot_evolution_numbers_dates_L <- 
  ggplot(
    data = df_plot_evolution_numbers_dates_L |> 
      mutate(
        country_code = factor(country_code, levels = order_countries_L)
      ),
    mapping = aes(x = date, y = value, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_point(
    data = df_lockdown_plot |>
      filter(country %in% unique(df_plot_evolution_numbers_dates_L$country)),
    colour = "black", size = 4
  ) +
  geom_point(
    data = df_lockdown_plot |> 
      filter(country %in% unique(df_plot_evolution_numbers_dates_L$country)),
    mapping = aes(colour = country_code), size = 3, show.legend = F) +
  labs(x = NULL, y = NULL) +
  scale_colour_manual(NULL, values = colour_countries) +
  scale_y_continuous(labels = comma) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_evolution_numbers_dates$date, n = 5)), 
    date_labels = "%b %d %Y") +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

and for small countries:

plot_evolution_numbers_dates_S <- 
  ggplot(
    data = df_plot_evolution_numbers_dates_S |> 
      mutate(country_code = factor(country_code, levels = order_countries_S)),
    mapping = aes(x = date, y = value, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_point(
    data = df_lockdown_plot |>
      filter(country %in% unique(df_plot_evolution_numbers_dates_S$country)),
    colour = "black", size = 4) +
  geom_point(
    data = df_lockdown_plot |> 
      filter(country %in% unique(df_plot_evolution_numbers_dates_S$country)),
    mapping = aes(colour = country_code), size = 3, show.legend = F
  ) +
  scale_shape_discrete("Lock-down") +
  scale_fill_manual(NULL, values = colour_countries, guide = "none") +
  scale_colour_manual(NULL, values = colour_countries) +
  labs(x = NULL, y = NULL) +
  scale_y_continuous(labels = comma) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_evolution_numbers_dates$date, n = 5)), 
    date_labels = "%b %d %Y"
  ) +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

These two plots can be put together on a single one:

p <-
  arrangeGrob(
    # Row 1
    plot_evolution_numbers_dates_L + 
      labs(y = NULL, title = "(a) Confirmed deaths, large countries"),
    plot_evolution_numbers_dates_S + 
      labs(y = NULL, title = "(b) Confirmed deaths, small countries."),
    nrow = 1
    ) |> 
  as_ggplot()

Warning: Removed 5 rows containing missing values (`geom_line()`).
Removed 5 rows containing missing values (`geom_line()`).

Figure 4.2: Confirmed deaths between the beginning of the Covid-19 epidemics and November 2020.

4.3.3 Stringency Index

Let us create a similar plot for the stringency index.

df_plot_stringency_index <- 
  confirmed_df |> 
  select(country, country_code, date, stringency_index) |> 
  mutate(
    country_type = ifelse(
      country %in% c("France", "Germany", "Italy",
                     "Spain", "United Kingdom"),
      yes = "L", "S")
  ) %>%
  mutate(
    country_type = factor(
      country_type, levels = c("L", "S"),
      labels = c("Larger countries", "Smaller countries")
    )
  )

Then we can create the plot for large countries:

plot_stringency_index_L <- 
  ggplot(
    data = df_plot_stringency_index |> 
      filter(country_type == "Larger countries") |> 
      mutate(country_code = fct_relevel(country_code, order_countries_L)),
    mapping =  aes(x = date, y = stringency_index, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_hline(yintercept = 70, linetype = "dotted") +
  labs(x = NULL, y = NULL) +
  scale_colour_manual(NULL, values = colour_countries) +
  scale_y_continuous(breaks = seq(0, 100, by = 20)) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_stringency_index$date, n = 7)), 
    date_labels = "%b %d %Y"
  ) +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

And for small countries:

plot_stringency_index_S <- 
  ggplot(
    data = df_plot_stringency_index |> 
      filter(country_type == "Smaller countries") |> 
      mutate(country_code = fct_relevel(country_code, order_countries_S)),
    mapping= aes(x = date, y = stringency_index, colour = country_code)
  ) +
  geom_line(linewidth = 1.1) +
  geom_hline(yintercept = 70, linetype = "dotted") +
  labs(x = NULL, y = NULL) +
  scale_colour_manual(NULL, values = colour_countries) +
  scale_y_continuous(breaks = seq(0, 100, by = 20)) +
  scale_x_date(
    breaks = ymd(pretty_dates(df_plot_stringency_index$date, n = 7)), 
    date_labels = "%b %d %Y") +
  theme_paper() +
  guides(colour = guide_legend(nrow = 2, byrow = TRUE))

Lastly, we can plot these two graphs on a single figure:

p <-
  arrangeGrob(
    # Row 1
    plot_stringency_index_L + 
      labs(y = NULL, title = "(a) Severity index, large countries"),
    plot_stringency_index_S + 
      labs(y = NULL, title = "(b) Severity index, small countries"),
    nrow = 1
  ) |> 
  as_ggplot()
p

Figure 4.3: Severity index between the beginning of the Covid-19 epidemics and November 2020.

4.3.4 Speed of reaction to the epidemic outbreak

The observation of a first case was the sign that the epidemic had reached the country. What was the delay between this first case and a significant reaction identified when the index was greater than 20?

We can first extract the date on which the severity index reaches the value of 20 for the first time as follows:

start_stringency_20 <- 
  confirmed_df |>
  filter(stringency_index >= 20) |>
  group_by(country) |>
  slice(1) |>
  select(country, date_stringency_20 = date, cases = value) |> 
  ungroup()

Then, we can get the date of the first case for each country:

start_first_case <- 
  df_plot_evolution_numbers_dates |> 
  filter(type == "Cases") |> 
  group_by(country) |> 
  filter(value > 0) |> 
  arrange(date) |> 
  slice(1) |> 
  select(country, first_case = date) |> 
  ungroup()

start_stringency_20 |> 
  left_join(start_first_case) |> 
  mutate(interval = lubridate::interval(first_case, date_stringency_20),
         delay = interval / lubridate::ddays(1)) |> 
  mutate(country = fct_relevel(country, names_countries)) |> 
  arrange(country) |> 
  select(country, first_case, delay, cases) |>
  kableExtra::kable()

Joining with `by = join_by(country)`

Table 4.3: Speed of reaction to the epidemic outbreak
country	first_case	delay	cases
United Kingdom	2020-01-31	46	4452
Spain	2020-02-01	37	1073
Italy	2020-01-31	21	20
Germany	2020-01-27	33	66
France	2020-01-24	36	100
Sweden	2020-02-01	40	771
Belgium	2020-02-04	38	559
Netherlands	2020-02-27	12	503
Ireland	2020-02-29	12	43
Denmark	2020-02-27	5	6

4.3.5 Confinement and deconfinement policies

Let us check how the different countries proceeded with deconfinement.

The date at which the stringency index reached its maximum value within the first 100 days since the end of January:

start_max_stringency <- 
  confirmed_df |> 
  group_by(country) |> 
  slice(1:100) |>
  arrange(desc(stringency_index), date) |> 
  slice(1) |> 
  select(country, start = date)
start_max_stringency

# A tibble: 10 × 2
# Groups:   country [10]
   country        start     
   <chr>          <date>    
 1 Belgium        2020-03-20
 2 Denmark        2020-03-18
 3 France         2020-03-17
 4 Germany        2020-03-22
 5 Ireland        2020-04-06
 6 Italy          2020-03-20
 7 Netherlands    2020-03-23
 8 Spain          2020-03-30
 9 Sweden         2020-04-01
10 United Kingdom 2020-03-24

We can easily obtain the date on which the index begins to fall from its maximum value, corresponding to a relaxation of policy measures (i.e., end of lockdown):

policies <- 
  confirmed_df |> 
  select(country, date, stringency_index) |> 
  left_join(start_max_stringency, by = c("country")) |> 
  group_by(country) |> 
  arrange(date) |> 
  filter(date >= start) |> 
  mutate(tmp = dplyr::lag(stringency_index)) |>
  mutate(same_strin = stringency_index == tmp) |>
  mutate(same_strin = ifelse(row_number()==1, TRUE, same_strin)) |>
  filter(same_strin == FALSE) |>
  slice(1) |> 
  mutate(length = lubridate::interval(start, date) / ddays(1)) |> 
  select(country, start, end=date, length)
policies

# A tibble: 10 × 4
# Groups:   country [10]
   country        start      end        length
   <chr>          <date>     <date>      <dbl>
 1 Belgium        2020-03-20 2020-05-05     46
 2 Denmark        2020-03-18 2020-04-15     28
 3 France         2020-03-17 2020-05-11     55
 4 Germany        2020-03-22 2020-05-03     42
 5 Ireland        2020-04-06 2020-05-18     42
 6 Italy          2020-03-20 2020-04-10     21
 7 Netherlands    2020-03-23 2020-05-11     49
 8 Spain          2020-03-30 2020-05-04     35
 9 Sweden         2020-04-01 2020-06-13     73
10 United Kingdom 2020-03-24 2020-05-11     48

The average of the strigency index between the max value and the end of containment can be obtained as follows:

strength_containment <- 
  confirmed_df |> 
  select(country, date, stringency_index) |> 
  left_join(policies, by = "country") |> 
  filter(date >= start, date < end) |> 
  group_by(country) |> 
  summarise(strength = mean(stringency_index))
strength_containment

# A tibble: 10 × 2
   country        strength
   <chr>             <dbl>
 1 Belgium            81.5
 2 Denmark            72.2
 3 France             88.0
 4 Germany            76.8
 5 Ireland            90.7
 6 Italy              85.2
 7 Netherlands        78.7
 8 Spain              85.2
 9 Sweden             64.8
10 United Kingdom     79.6

We can count how many smoothing and how many restrengthening actions were made till the end of our sample:

policy_changes <- 
  confirmed_df |> 
  select(country, date, stringency_index) |> 
  left_join(policies, by = "country") |> 
  filter(date >=  end, date <= end_date_sample) |> 
  group_by(country) |> 
  mutate(tmp = dplyr::lag(stringency_index)) |> 
  mutate(tmp = ifelse(row_number()==1, stringency_index, tmp)) |> 
  mutate(same_strin = stringency_index == tmp) |>
  mutate(
    smoothing = ifelse(!same_strin & stringency_index < tmp, TRUE, FALSE),
    restrenghtening = ifelse(
      !same_strin & stringency_index > tmp, TRUE, FALSE)
  ) |> 
  summarise(
    changes_smo = sum(smoothing),
    changes_res = sum(restrenghtening)
  )
policy_changes

# A tibble: 10 × 3
   country        changes_smo changes_res
   <chr>                <int>       <int>
 1 Belgium                  8           3
 2 Denmark                  5           1
 3 France                   6           3
 4 Germany                  9           4
 5 Ireland                  4           2
 6 Italy                    5           4
 7 Netherlands              4           2
 8 Spain                    9           4
 9 Sweden                   1           0
10 United Kingdom           8           4

Lastly, we can gather all this information in a single table:

policies |> 
  left_join(strength_containment, by = "country") |> 
  left_join(policy_changes, by = "country") |> 
  mutate(country = factor(country)) |> 
  mutate(country = fct_relevel(country, names_countries)) |> 
  arrange(country) |> 
  mutate(
    start = format(start, "%B %d"),
    end = format(end, "%B %d")
  ) |> 
  kableExtra::kable()

?(caption)

country	start	end	length	strength	changes_smo	changes_res
United Kingdom	March 24	May 11	48	79.63	8	4
Spain	March 30	May 04	35	85.19	9	4
Italy	March 20	April 10	21	85.19	5	4
Germany	March 22	May 03	42	76.85	9	4
France	March 17	May 11	55	87.96	6	3
Sweden	April 01	June 13	73	64.81	1	0
Belgium	March 20	May 05	46	81.48	8	3
Netherlands	March 23	May 11	49	78.70	4	2
Ireland	April 06	May 18	42	90.74	4	2
Denmark	March 18	April 15	28	72.22	5	1

Overview of the smoothing and restrenghthening actions during between the beginning of the Codiv-19 epidemic and November 2020.

4.4 Saving the results

Let us save the following R objects for later use.

save(
  confirmed_df,
  deaths_df,
  population,
  h,
  stringency_dates,
  names_countries,
  names_countries_large,
  names_countries_small,
  order_countries_L,
  order_countries_S,
  colour_countries,
  colour_table,
  country_codes,
  theme_paper,
  file = "data/data_after_load.rda"
  )