Let us focus on individual characteristics. Those are recorded in the table df_personne.
nrow(df_personne)
[1] 23047
As can be seen, this table contains 23047 individuals. We can also observe that we are able to find a match for all these people in the household characteristics (df_menage):
df_personne |># Add household characteristicsanti_join(df_menage, by ="ben_n4") |>nrow()
[1] 0
But we are not able to find a match for all individuals in some parts of the ESPS database. That is normal, as some questions were asked to the respondents only if they have specific attributes. Let us check how many people from df_personne cannot be found in the different parts of the ESPS survey. First, let us consider the df_opinion_1 dataset. The number of individuals who did not respond to questions relative to this part (see variable dictionary)
df_personne |>anti_join(df_opinion_1, by =c("ben_n4", "nind")) |>nrow()
[1] 14634
Then, we can consider the df_qst dataset (see online dictionary). This table gives information relative to working conditions. Only people aged 16 to 65 years old can be included in this dataset. The number of people from df_personne that are not found in df_qst is:
df_personne |>anti_join(df_qst, by =c("ben_n4", "nind")) |>nrow()
[1] 16848
The economic and social situation of indidivuals is filled in df_qes. The number of individuals for whom no match is found is:
df_personne |>anti_join(df_qes, by =c("ben_n4", "nind")) |>nrow()
[1] 9936
Lastly, we can look at the number of people for whom we do not have a match with social security data (SNIIRAM):
df_personne |>anti_join(data_sniiram, by =c("ben_n4", "nind")) |>nrow()
[1] 11026
Then, the tables can be merged into a single one. To that end, we should first create an ID for each individual.
# Opinion first wavedf <- df |>left_join(df_opinion_1, by =c("ben_n4", "nind", "id_individual")) |>mutate(is_in_df_opinion_1 = id_individual %in% df_opinion_1$id_individual)# Working conditionsdf <- df |>left_join(df_qst, by =c("ben_n4", "nind", "id_individual")) |>mutate(is_in_df_qst = id_individual %in% df_qst$id_individual)# Economic and social situationdf <- df |>left_join(df_qes, by =c("ben_n4", "nind", "id_individual")) |>mutate(is_in_df_qes = id_individual %in% df_qes$id_individual)# Social Insurancedf <- df |>left_join(data_sniiram, by =c("ben_n4", "nind", "id_individual")) |>mutate(is_in_data_sniiram = id_individual %in% data_sniiram$id_individual)# Mutual Insurancedf <- df |>left_join( df_mutuelle_sans_redondance,by =c("ben_n4", "nind", "id_individual") ) |>mutate(is_in_df_mutuelle = id_individual %in% df_mutuelle_sans_redondance$id_individual )
Let us check (again) how many individuals were found in the different datasets:
var_g <-c("is_in_df_opinion_1", "is_in_df_qst", "is_in_df_qes", "is_in_data_sniiram", "is_in_df_mutuelle")df |>select(!!!syms(var_g)) |>summarise(across(.cols =everything(), .fns =~sum(.))) |> kableExtra::kable(caption =paste0("Number of individuals present in the different parts ","of the survey" ) ) |> kableExtra::kable_classic(full_width = F, html_font ="Cambria")
Number of individuals present in the different parts of the survey
is_in_df_opinion_1
is_in_df_qst
is_in_df_qes
is_in_data_sniiram
is_in_df_mutuelle
8413
6199
13111
12021
19941
# Corresponding proportionsdf |>select(!!!syms(var_g)) |>summarise(across(.cols =everything(), .fns =~round(100*sum(.) /n(), digits =2)) ) |> kableExtra::kable(caption =paste0("Proportion of individuals present in the different parts ","of the survey (in percent)" ) )|> kableExtra::kable_classic(full_width = F, html_font ="Cambria")
Proportion of individuals present in the different parts of the survey (in percent)
is_in_df_opinion_1
is_in_df_qst
is_in_df_qes
is_in_data_sniiram
is_in_df_mutuelle
36.5
26.9
56.89
52.16
86.52
Here are the dimensions of this table:
dim(df)
[1] 23047 821
If we want to work with regional aspects at some point, we can create a table which provides the correspondence between the survey data and INSEE data.
Let us keep track on the number of observation each time we filter out observations.
nb_obs_df <-tibble(step ="raw", n =nrow(df))
We first remove individuals younger than 15 years old:
df <- df |>filter(PERSONNE_age >=15)
nb_obs_df <- nb_obs_df |>bind_rows(tibble(step ="age_15", n =nrow(df)) )nb_obs_df
# A tibble: 2 × 2
step n
<chr> <int>
1 raw 23047
2 age_15 18561
1.2 Correction of the MHI-5 score
One of the key variables of the dataset which is of interest for us is the MHI-5 score (see, e.g.Hoeymans et al. (2004), Rumpf et al. (2001), Thorsen et al. (2013)). For somewhat reason, the score was incorrectly calculated. Traditionally, the MHI-5 score is calculated from the answers to five questions, each question containing 6 ordered answer items (1 = all of the time, 2 =most of the time, 3 = a good bit of time, 4 = some of the time, 5 = a little of the time, 6 = none of the time). The MHI-5 score formula writes: \[\text{MHI-5} = 100 \times \frac{\sum_{q=1}^{5} s_q - 5}{25},\] where \(s_q\) is the score of the \(q^{\text{th}}\) question.
However, in the French version of the survey, the answer items are only 5 (because the translation of some items does not allow the same richness of nuance as English). To obtain a normalized score between 0 and 100, the formula shoud write: \[\text{MHI-5} = 100 \times \frac{\sum_{q=1}^{5} s_q - 5}{\boldsymbol{20}}.\]
We therefore need to correct the MHI-5 score from the data, simply by multiplying it by 1.25:
Section 1.4 provides some codes to compute the MHI-5 score manually from the answers given to the questions that allow it to be calculated.
1.3 Data formatting
Qualitative variables have numerical levels in the original dataset. Consulting the data dictionary makes it possible to assign more comprehensible labels.
All the following variables are qualitative. We will adopt the following rules for variables from sub-parts of the survey:
if the individual was included in the sub-part of the survey but did not respond, we will set the answer to “Not reported”
if the individual was not surveyed (was not part of the sub-part of the survey), we will set the answer to “Not surveyed”
df <- df |>mutate(across(c("PERSONNE_pb_depress","PERSONNE_etasante","PERSONNE_statut" ), as.character ) ) |>mutate(PERSONNE_pb_depress =replace_na(PERSONNE_pb_depress, "Not reported"),PERSONNE_pb_depress =factor( PERSONNE_pb_depress, levels =c(2, 1, "Not reported"),labels =c("No depression", "Depression", "Not reported") ),PERSONNE_etasante =replace_na(PERSONNE_etasante, "Not reported"),PERSONNE_etasante =factor( PERSONNE_etasante,levels =c(1:5, "Not reported"),labels =c("Very Good","Good","Fairly Good","Bad","Very Bad","Not reported" ) ) ) |>mutate(PERSONNE_etat_sante =case_when( PERSONNE_etasante %in%c("Very Good", "Good", "Fairly Good" ) ~"Very Good, Good or Fairly Good", PERSONNE_etasante %in%c("Bad", "Very Bad") ~"Very Bad or Bad", PERSONNE_etasante =="Not reported"~"Not reported",TRUE~"other" ) ) |>mutate(across(c( PERSONNE_pb_asthm, PERSONNE_pb_bronchit, PERSONNE_pb_infarctus, PERSONNE_pb_coronair, PERSONNE_pb_hypertens, PERSONNE_pb_avc, PERSONNE_pb_arthros, PERSONNE_pb_lombalgi, PERSONNE_pb_cervical, PERSONNE_pb_diabet, PERSONNE_pb_allergi, PERSONNE_pb_cirrhos, PERSONNE_pb_urinair, PERSONNE_pb_non ),~factor(replace_na(as.character(.x), "Not reported"),levels =c(2, 1, "Not reported"),labels =c("No", "Yes", "Not reported") ) ) ) |>mutate(PERSONNE_sexe =factor( PERSONNE_sexe, levels =c(2, 1), labels =c("Female", "Male") ),PERSONNE_couple =factor( PERSONNE_couple, levels =c(1,2), labels =c("Yes", "No") ),PERSONNE_etatleg =factor( PERSONNE_etatleg, levels =c(1,2,3,4,5,7,8),labels =c("Married, Civil Union", "Divorced, separated", "Widowed","Lives in a marriage or concubinage", "Single","Does not know", "Refuses to answer" ) ),PERSONNE_statut =ifelse( PERSONNE_statut %in%c(1,2,3,4), yes ="Public employee", no = PERSONNE_statut ),PERSONNE_statut =ifelse( PERSONNE_statut %in%c(6, 7, 8, 9, 10), yes ="Other", no = PERSONNE_statut ),PERSONNE_statut =replace_na(PERSONNE_statut, "Not reported"),PERSONNE_statut =factor( PERSONNE_statut, levels =c("Public employee", 5, "Other", "Not reported"),labels =c("Public employee","Private employee","Other","Not reported") ),PERSONNE_ss =factor( PERSONNE_ss, levels =c(1, 2),labels =c("Yes (own)","Yes (third party)") ),# "Yes, because he/she works, or receives unemployment benefits, # is a student, retired, widower of a pensioner, disabled, beneficiary of # the basic CMU...",# "Yes, as the beneficiary of a person living in your household")),PERSONNE_regime =ifelse( PERSONNE_regime %in%c(2,3,4,5,6,7), yes ="Public service", no = PERSONNE_regime ),PERSONNE_regime =ifelse( PERSONNE_regime %in%c(12, 13, 14), yes ="Other (Student, abroad, other)", no = PERSONNE_regime ),PERSONNE_regime =factor( PERSONNE_regime, levels =c(1, "Public service", 8, 9, 10, 11, "Other (Student, abroad, other)", 15 ),labels =c("The general scheme (Cnamts)","Public service","The local Alsace-Moselle scheme","The basic Universal Health Coverage","The agricultural scheme","The self-employed scheme","Other (Student, abroad, other)","Does not know" ) ),PERSONNE_rap_pcs8 =factor( PERSONNE_rap_pcs8,levels =1:9,labels =c("Farmer","Craftsman, trader","Executive and intellectual profession","Intermediate occupation","Administrative employee","Commercial employee","Skilled worker","Unskilled worker","Inactive having never worked" ) ),PERSONNE_ald =factor( PERSONNE_ald, levels =1:3,labels =c("Yes", "No", "Does not know") ) ) |>mutate(MENAGE_rap_typmen =factor( MENAGE_rap_typmen,levels =c(1, 2, 5, 6, 7),labels =c("Single person","Single-parent family","Couple without children","Couple with children","Other" ) ),MENAGE_region =as.character(MENAGE_region),MENAGE_region =factor( MENAGE_region,levels =c(1:22),labels =c("Région parisienne","Champagne-Ardenne","Picardie","Haute-Normandie","Centre","Basse-Normandie","Bourgogne","Nord-Pas-de-Calais","Lorraine","Alsace","Franche-Comté","Pays de la Loire","Bretagne","Poitou-Charentes","Aquitaine","Midi-Pyrénées","Limousin","Rhône-Alpes","Auvergne","Languedoc-Roussillon","Provence-Alpes-Côte d'Azur","Corse" ) ),MENAGE_revenu =factor( MENAGE_revenu, levels =c(0:12, 99),labels =c("Does not know","< 700","[700, 1000)","[1000, 1200)","[1200, 1500)","[1500, 1800)","[1800, 2200)","[2200, 2500)","[2500, 3000)","[3000, 3500)","[3500, 4500)",">= 4500","Refuse to declare","Abandonment before 2nd contact " ) ), ) |>mutate(MUTUELLE_assu =ifelse( MUTUELLE_assu %in%c(2,3), yes ="Beneficiary", no = MUTUELLE_assu ),MUTUELLE_assu =ifelse( is_in_df_mutuelle &is.na(MUTUELLE_assu), yes ="Not reported", no = MUTUELLE_assu ),MUTUELLE_assu =ifelse(!is_in_df_mutuelle, yes ="Not surveyed", no = MUTUELLE_assu ),MUTUELLE_assu =factor( MUTUELLE_assu,levels =c(0, 1, "Beneficiary", "Not reported", "Not surveyed"),labels =c("Does not know","Self insured","Beneficiary","Not reported","Not surveyed" ) ),#MUTUELLE_typcc =ifelse( is_in_df_mutuelle &is.na(MUTUELLE_typcc), yes ="Not reported", no = MUTUELLE_typcc ),MUTUELLE_typcc =ifelse(!is_in_df_mutuelle, yes ="Not surveyed", no = MUTUELLE_typcc),MUTUELLE_typcc =factor( MUTUELLE_typcc, levels =c(0, 1, 2, 3, 4, "Not reported", "Not surveyed"),labels =c("Indefinite","Mutual insurance","Pension fund","Private insurance","Insurance broker","Not reported","Not surveyed" ) ), ) |># First wave answers to research questionsmutate(OPINION1_renonc_cons =ifelse( is_in_df_opinion_1 &is.na(OPINION1_renonc_cons),yes ="Not reported", no = OPINION1_renonc_cons ),OPINION1_renonc_cons =ifelse(!is_in_df_opinion_1,yes ="Not surveyed", no = OPINION1_renonc_cons ),OPINION1_renonc_cons =factor( OPINION1_renonc_cons,levels =c(1,2, "Not reported", "Not surveyed"),labels =c("Yes", "No", "Not reported", "Not surveyed") ),#OPINION1_renonc_dent =ifelse( is_in_df_opinion_1 &is.na(OPINION1_renonc_dent),yes ="Not reported", no = OPINION1_renonc_dent ),OPINION1_renonc_dent =ifelse(!is_in_df_opinion_1,yes ="Not surveyed", no = OPINION1_renonc_dent ),OPINION1_renonc_dent =factor( OPINION1_renonc_dent, levels =c(1,2, "Not reported", "Not surveyed"),labels =c("Yes", "No", "Not reported", "Not surveyed") ),#OPINION1_renonc_fin =ifelse( is_in_df_opinion_1 &is.na(OPINION1_renonc_fin),yes ="Not reported", no = OPINION1_renonc_fin ),OPINION1_renonc_fin =ifelse(!is_in_df_opinion_1,yes ="Not surveyed", no = OPINION1_renonc_fin ),OPINION1_renonc_fin =factor( OPINION1_renonc_fin, levels =c(1,2, "Not reported", "Not surveyed"),labels =c("Yes", "No", "Not reported", "Not surveyed") ),#OPINION1_renonc_loin =ifelse( is_in_df_opinion_1 &is.na(OPINION1_renonc_loin),yes ="Not reported",no = OPINION1_renonc_loin ),OPINION1_renonc_loin =ifelse(!is_in_df_opinion_1,yes ="Not surveyed", no = OPINION1_renonc_loin ),OPINION1_renonc_loin =factor( OPINION1_renonc_loin, levels =c(1,2, "Not reported", "Not surveyed"),labels =c("Yes", "No", "Not reported", "Not surveyed") ),#OPINION1_renonc_long =ifelse( is_in_df_opinion_1 &is.na(OPINION1_renonc_long),yes ="Not reported", no = OPINION1_renonc_long ),OPINION1_renonc_long =ifelse(!is_in_df_opinion_1,yes ="Not surveyed", no = OPINION1_renonc_long ),OPINION1_renonc_long =factor( OPINION1_renonc_long,levels =c(1,2, "Not reported", "Not surveyed"),labels =c("Yes", "No", "Not reported", "Not surveyed") ), ) |># Working conditionsmutate(QST_ct_depech =ifelse( is_in_df_qst &is.na(QST_ct_depech),yes ="Not reported", no = QST_ct_depech ),QST_ct_depech =ifelse(!is_in_df_qst,yes ="Not surveyed", no = QST_ct_depech ),QST_ct_depech =factor( QST_ct_depech, levels =c(1:4, "Not reported", "Not surveyed"),labels =c("Always", "Often", "Sometimes", "Never", "Not reported", "Not surveyed" ) ),#QST_ct_liberte =ifelse( is_in_df_qst &is.na(QST_ct_liberte),yes ="Not reported", no = QST_ct_liberte ),QST_ct_liberte =ifelse(!is_in_df_qst,yes ="Not surveyed", no = QST_ct_liberte ),QST_ct_liberte =factor( QST_ct_liberte, levels =c(1:4, "Not reported", "Not surveyed"),labels =c("Always", "Often", "Sometimes", "Never", "Not reported", "Not surveyed" ) ),#QST_ct_apprend =ifelse( is_in_df_qst &is.na(QST_ct_apprend),yes ="Not reported", no = QST_ct_apprend ),QST_ct_apprend =ifelse(!is_in_df_qst,yes ="Not surveyed",no = QST_ct_apprend ),QST_ct_apprend =factor( QST_ct_apprend, levels =c(1:4, "Not reported", "Not surveyed"),labels =c("Always", "Often", "Sometimes", "Never", "Not reported", "Not surveyed" ) ),#QST_ct_aidecol =ifelse( is_in_df_qst &is.na(QST_ct_aidecol),yes ="Not reported", no = QST_ct_aidecol ),QST_ct_aidecol =ifelse(!is_in_df_qst,yes ="Not surveyed", no = QST_ct_aidecol ),QST_ct_aidecol =factor( QST_ct_aidecol, levels =c(1:5, "Not reported", "Not surveyed"),labels =c("Always", "Often", "Sometimes", "Never", "Not concered", "Not reported", "Not surveyed") ),#QST_ct_travnuit =ifelse( is_in_df_qst &is.na(QST_ct_travnuit),yes ="Not reported", no = QST_ct_travnuit ),QST_ct_travnuit =ifelse(!is_in_df_qst,yes ="Not surveyed", no = QST_ct_travnuit ),QST_ct_travnuit =factor( QST_ct_travnuit, levels =c(1:4, "Not reported", "Not surveyed"),labels =c("Always", "Often", "Sometimes", "Never", "Not reported", "Not surveyed" ) ),#QST_ct_repet =ifelse( is_in_df_qst &is.na(QST_ct_repet),yes ="Not reported", no = QST_ct_repet ),QST_ct_repet =ifelse(!is_in_df_qst,yes ="Not surveyed", no = QST_ct_repet ),QST_ct_repet =factor( QST_ct_repet, levels =c(1:4, "Not reported", "Not surveyed"),labels =c("Always", "Often", "Sometimes", "Never", "Not reported", "Not surveyed" ) ),#QST_ct_lourd =ifelse( is_in_df_qst &is.na(QST_ct_lourd),yes ="Not reported", no = QST_ct_lourd ),QST_ct_lourd =ifelse(!is_in_df_qst,yes ="Not surveyed", no = QST_ct_lourd ),QST_ct_lourd =factor( QST_ct_lourd, levels =c(1:4, "Not reported", "Not surveyed"),labels =c("Always", "Often", "Sometimes", "Never", "Not reported", "Not surveyed" ) ),#QST_ct_posture =ifelse( is_in_df_qst &is.na(QST_ct_posture),yes ="Not reported", no = QST_ct_posture ),QST_ct_posture =ifelse(!is_in_df_qst,yes ="Not surveyed", no = QST_ct_posture ),QST_ct_posture =factor( QST_ct_posture, levels =c(1:4, "Not reported", "Not surveyed"),labels =c("Always", "Often", "Sometimes", "Never", "Not reported", "Not surveyed" ) ),#QST_ct_produit =ifelse( is_in_df_qst &is.na(QST_ct_produit),yes ="Not reported", no = QST_ct_produit ),QST_ct_produit =ifelse(!is_in_df_qst,yes ="Not surveyed", no = QST_ct_produit ),QST_ct_produit =factor( QST_ct_produit, levels =c(1:4, "Not reported", "Not surveyed"),labels =c("Always", "Often", "Sometimes", "Never", "Not reported", "Not surveyed" ) ) ) |># Economic and social situationmutate(QES_association =ifelse( is_in_df_qes &is.na(QES_association),yes ="Not reported", no = QES_association ),QES_association =ifelse(!is_in_df_qes,yes ="Not surveyed", no = QES_association ),QES_association =factor( QES_association, levels =c(1:2, "Not reported", "Not surveyed"),labels =c("Yes", "No", "Not reported", "Not surveyed") ),#QES_tpsami =ifelse( is_in_df_qes &is.na(QES_tpsami),yes ="Not reported", no = QES_tpsami ),QES_tpsami =ifelse(!is_in_df_qes,yes ="Not surveyed", no = QES_tpsami ),QES_tpsami =factor( QES_tpsami, levels =c(1:5, "Not reported", "Not surveyed"),labels =c("Every day or almost every day","At least once a week","At least once a month","Less than once a month","Never","Not reported", "Not surveyed" ) ),#QES_tpsasso =ifelse( is_in_df_qes &is.na(QES_tpsasso),yes ="Not reported", no = QES_tpsasso ),QES_tpsasso =ifelse(!is_in_df_qes,yes ="Not surveyed", no = QES_tpsasso ),QES_tpsasso =factor( QES_tpsasso, levels =c(1:5, "Not reported", "Not surveyed"),labels =c("Every day or almost every day","At least once a week","At least once a month","Less than once a month","Never","Not reported", "Not surveyed" ) ),#QES_tpscolleg =ifelse( is_in_df_qes &is.na(QES_tpscolleg),yes ="Not reported", no = QES_tpscolleg ),QES_tpscolleg =ifelse(!is_in_df_qes,yes ="Not surveyed", no = QES_tpscolleg ),QES_tpscolleg =factor( QES_tpscolleg, levels =c(1:5, "Not reported", "Not surveyed"),labels =c("Every day or almost every day","At least once a week","At least once a month","Less than once a month","Never","Not reported", "Not surveyed" ) ),#QES_tpsfamil =ifelse( is_in_df_qes &is.na(QES_tpsfamil),yes ="Not reported", no = QES_tpsfamil ),QES_tpsfamil =ifelse(!is_in_df_qes,yes ="Not surveyed", no = QES_tpsfamil ),QES_tpsfamil =factor( QES_tpsfamil,levels =c(1:5, "Not reported", "Not surveyed"),labels =c("Every day or almost every day","At least once a week","At least once a month","Less than once a month","Never","Not reported", "Not surveyed" ) ),#QES_mere_etude =ifelse( is_in_df_qes &is.na(QES_mere_etude),yes ="Not reported", no = QES_mere_etude ),QES_mere_etude =ifelse(!is_in_df_qes,yes ="Not surveyed", no = QES_mere_etude ),QES_mere_etude =factor( QES_mere_etude,levels =c(1:7, "Not reported", "Not surveyed"),labels =c("Never Been to School","Nursery School, Primary school, Certificate of studies","1st cycle (~Middle School)","2nd cycle (~High School","Higher Education","Other","Does not know","Not reported", "Not surveyed" ) ),QES_pere_etude =ifelse( is_in_df_qes &is.na(QES_pere_etude),yes ="Not reported", no = QES_pere_etude ),QES_pere_etude =ifelse(!is_in_df_qes,yes ="Not surveyed", no = QES_pere_etude ),QES_pere_etude =factor( QES_pere_etude,levels =c(1:7, "Not reported", "Not surveyed"),labels =c("Never Been to School","Nursery School, Primary school, Certificate of studies","1st cycle (~Middle School)","2nd cycle (~High School","Higher Education","Other","Does not know","Not reported", "Not surveyed" ) ) ) |># SNIIRAMmutate(SOINS_ald_am =ifelse( is_in_data_sniiram &is.na(SOINS_ald_am), yes ="Not reported", no = SOINS_ald_am) ,SOINS_ald_am =ifelse(!is_in_data_sniiram, yes ="Not surveyed", no = SOINS_ald_am ),SOINS_ald_am =factor( SOINS_ald_am, levels =c(1,0, "Not reported", "Not surveyed"),labels =c("Yes", "No", "Not reported", "Not surveyed") ), )
The same can be done for ordered qualitative variables.
df <- df |>mutate(MENAGE_tu =ifelse( MENAGE_tu %in%c(0, 1, 2), yes ="Small Municipality", no = MENAGE_tu ),MENAGE_tu =ifelse( MENAGE_tu %in%c(3, 4), yes ="Medium Municipality", no = MENAGE_tu ),MENAGE_tu =ifelse( MENAGE_tu %in%c(5, 6, 7), yes ="Large Municipality", no = MENAGE_tu ),MENAGE_tu =ifelse( MENAGE_tu %in%c(8), yes ="Paris metropolitan area", no = MENAGE_tu ),MENAGE_tu =ifelse( is_in_df_mutuelle &is.na(MENAGE_tu), yes ="Not reported", no = MENAGE_tu ),MENAGE_tu =ifelse(!is_in_df_mutuelle, yes ="Not surveyed", no = MENAGE_tu ),MENAGE_tu =factor( MENAGE_tu,levels =c("Small Municipality","Medium Municipality","Large Municipality","Paris metropolitan area","Not reported","Not surveyed"), ordered =TRUE ),#MENAGE_rap_zau =ifelse( MENAGE_rap_zau %in%c(1,2),yes ="Major urban cluster (and its crown)",no = MENAGE_rap_zau ),MENAGE_rap_zau =ifelse( MENAGE_rap_zau %in%c(3,4),yes ="Medium/small urban cluster (and its crown)",no = MENAGE_rap_zau ),MENAGE_rap_zau =ifelse( is_in_df_mutuelle &is.na(MENAGE_rap_zau), yes ="Not reported", no = MENAGE_rap_zau ),MENAGE_rap_zau =ifelse(!is_in_df_mutuelle, yes ="Not surveyed", no = MENAGE_rap_zau),MENAGE_rap_zau =factor( MENAGE_rap_zau,levels =c("Major urban cluster (and its crown)","Medium/small urban cluster (and its crown)",5,"Not reported","Not surveyed" ),labels =c("Major urban cluster (and its crown)","Medium/small urban cluster (and its crown)","Spaces outside the area of influence of cities","Not reported","Not surveyed" ), ordered =TRUE ) )
Let us make the an assumption regarding the variable PERSONNE_couple. This variable has missing values.
for all individuals younger than 15 (this question was not asked to people \(\leq\) 15). We assume here that people 15 for which the value of the variable PERSONNE_couple is missing are actually not living as a couple.
for individuals older than 15 and living alone, we consider, if they have missing value for PERSONNE_couple that tey are not living as a couple.
for other individuals: we assign then the value Not reported.
The number of individuals in the first case (15 years old and missing value for PERSONNE_couple:
# A tibble: 1 × 2
# Groups: PERSONNE_couple [1]
PERSONNE_couple n
<fct> <int>
1 <NA> 53
Let us make the changes:
df <- df |># Recode NA as "Not reported"mutate(PERSONNE_couple =ifelse(is.na(PERSONNE_couple),yes ="Not reported",no =as.character(PERSONNE_couple) ) ) |># If the "couple" variable is "NA" and either lives alone or is 15yo: not in a couplemutate(PERSONNE_couple =ifelse(test = PERSONNE_couple =="Not reported"& (MENAGE_nbpers ==1| PERSONNE_age ==15),yes ="No",no = PERSONNE_couple) )
1.4 Computing manually the MHI Scores
As we noticed the MHI-Score is miscalculated in the ESPS data, we decided to compute it from scratch.
Some individuals did not answer all the questions required to construct the MHI-5 and MHI-3 scores (for more details on MHI-3, please refer to Yamazaki, Fukuhara, and Green (2005)). As described in the literature, to construct the MHI-5 score, missing values are imputed provided that individuals answered at least 3 of the 5 questions. The imputed value is the average of the responses given.
df <- df |>mutate(PERSONNE_score_t_corrected = PERSONNE_score_t *1.25) |># Some variables (PERSONNE_calme and PERSONNE_heureux) need to be adjustedmutate(mhi_score_nerveux = PERSONNE_nerveu,mhi_score_calme =6- PERSONNE_calme,mhi_score_triste = PERSONNE_triste,mhi_score_heureux =6- PERSONNE_heureux,mhi_score_decourage = PERSONNE_decourag ) |># Counting the number of missing values for each respondantmutate(nb_na_mhi = (!is.na(mhi_score_nerveux)) + (!is.na(mhi_score_calme)) + (!is.na(mhi_score_triste)) + (!is.na(mhi_score_heureux)) + (!is.na(mhi_score_decourage)) ) |>mutate(nb_na_mhi_3 = (!is.na(mhi_score_triste)) + (!is.na(mhi_score_heureux)) + (!is.na(mhi_score_decourage)) ) %>%# Computing the average score for the items to which individuals respondedmutate(mean_scores =rowMeans(select(., mhi_score_nerveux:mhi_score_decourage), na.rm = T ) ) |># Assigning the average of the items to the missing elementsmutate(mhi_score_nerveux =ifelse(is.na(mhi_score_nerveux), yes = mean_scores, no = mhi_score_nerveux ),mhi_score_calme =ifelse(is.na(mhi_score_calme), yes = mean_scores, no = mhi_score_calme ),mhi_score_triste =ifelse(is.na(mhi_score_triste), yes = mean_scores, no = mhi_score_triste ),mhi_score_heureux =ifelse(is.na(mhi_score_heureux), yes = mean_scores, no = mhi_score_heureux ),mhi_score_decourage =ifelse(is.na(mhi_score_decourage), yes = mean_scores, no = mhi_score_decourage ) ) |>mutate(mhi_5_raw = mhi_score_nerveux + mhi_score_calme + mhi_score_triste + mhi_score_heureux + mhi_score_decourage ) |>mutate(score_t_corrected = (mhi_5_raw -min(mhi_5_raw, na.rm=T)) * (100) / (max(mhi_5_raw, na.rm =TRUE) -min(mhi_5_raw, na.rm =TRUE)) ) |># If the individual has not answered more than 3 items, the MHI-5 score is# not calculable# In such a case, the MHI-5 score is set to 0mutate(score_t_corrected =ifelse(nb_na_mhi <3, yes =NA, no = score_t_corrected) ) |># MHI-3 scoremutate(mhi_3_raw = mhi_score_triste + mhi_score_heureux + mhi_score_decourage,score_mhi_3 =100*(mhi_3_raw -min(mhi_3_raw, na.rm=TRUE)) / (max(mhi_3_raw, na.rm =TRUE) -min(mhi_3_raw, na.rm =TRUE)) ) |>mutate(score_mhi_3 =ifelse(nb_na_mhi <3, yes =NA, no = score_mhi_3)) |>select(-mhi_5_raw, -mhi_3_raw)
Let us create two variables which provide the first quantile of the distribution of each score.
We can create a variable named PERSONNE_pb_depress_2 which will recode people who did not respond to the question regarding depression as No depression (this is not used later in the paper).
Lastly, let us add an ID to each individual (the row number in the table).
df <- df |>mutate(id =row_number())
The results can be saved:
save(df, file ="../data/df_merged.rda")
Hoeymans, Nancy, Anna A. Garssen, Gert P. Westert, and Peter FM Verhaak. 2004. “Measuring Mental Health of the Dutch Population: A Comparison of the GHQ-12 and the MHI-5.”Health and Quality of Life Outcomes 2 (1): 23. https://doi.org/10.1186/1477-7525-2-23.
Rumpf, Hans-Jürgen, Christian Meyer, Ulfert Hapke, and Ulrich John. 2001. “Screening for Mental Health: Validity of the MHI-5 Using DSM-IVAxisI Psychiatric Disorders as Gold Standard.”Psychiatry Research 105 (3): 243–53. https://doi.org/10.1016/s0165-1781(01)00329-8.
Thorsen, Sannie Vester, Reiner Rugulies, Pernille U. Hjarsbech, and Jakob Bue Bjorner. 2013. “The Predictive Value of Mental Health for Long-Term Sickness Absence: The MajorDepressionInventory (MDI) and the MentalHealthInventory (MHI-5) Compared.”BMC Medical Research Methodology 13 (1): 115. https://doi.org/10.1186/1471-2288-13-115.
Yamazaki, Shin, Shunichi Fukuhara, and Joseph Green. 2005. “Usefulness of Five-Item and Three-Item MentalHealthInventories to Screen for Depressive Symptoms in the General Population of Japan.”Health and Quality of Life Outcomes 3 (1): 48. https://doi.org/10.1186/1477-7525-3-48.