[Appendix C — Exploratory Data Analysis]{#sec-sm-exploratory-data-analysis .quarto-section-identifier}

doi:10.17605/OSF.IO/YGKTS

Appendix C — Exploratory Data Analysis

C.1 Overview

This document offers a detailed exploration of the survey data, highlighting key patterns and insights derived from the analysis.

It emphasizes the analysis sample, with explicit notes whenever the full sample is referenced. The analysis sample is a subset of the full sample, consisting of Brazilian individuals aged 18 or older, residing in the UTC-3 timezone, who completed the survey between October 15th and 21st, 2017.

Models were developed using cell weights to address sample imbalances. For details on the sample balancing procedure, refer to Supplementary Material D.

C.2 Setting the Environment

Code

library(cli)
library(dplyr)
library(fBasics)
library(geobr)
library(ggplot2)
library(here)
library(hms)
library(janitor)
library(lubridate)
library(lubritime) # github.com/danielvartan/lubritime
library(magrittr)
library(moments)
library(nortest)
library(orbis) # github.com/danielvartan/orbis
library(patchwork)
library(prettycheck) # github.com/danielvartan/prettycheck
library(purrr)
library(quartor) # github.com/danielvartan/quartor
library(rlang)
library(rutils) # github.com/danielvartan/rutils
library(sidrar)
library(stats)
library(stringr)
library(targets)
library(tidyr)
library(tseries)

C.3 Loading the Data

targets::tar_make(script = here::here("_targets.R"))

C.3.1 Full Sample

anonymized_data <-
  targets::tar_read("anonymized_data", store = here::here("_targets"))

C.3.2 Analysis Sample

weighted_data <-
  targets::tar_read("weighted_data", store = here::here("_targets"))

C.4 Distribution of Main Variables

Code

weighted_data |>
  rutils:::stats_summary(
    col = "msf_sc",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.1: Statistics for the msf_sc variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "msf_sc",
    x_label = "MSF~sc~ (Chronotype proxy) (Local time)"
  )

Figure C.1: Histogram of the `msf_sc` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "msf_sc")

Figure C.2: Box plot of the `msf_sc` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "age",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.2: Statistics for the age variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "age",
    x_label = "Age (years)"
  )

Figure C.3: Histogram of the `age` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "age")

Figure C.4: Box plot of the `age` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "latitude",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.3: Statistics for the latitude variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "latitude",
    x_label = "Latitude (Decimal degrees)"
  )

Figure C.5: Histogram of the `latitude` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "latitude")

Figure C.6: Box plot of the `latitude` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "longitude",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.4: Statistics for the longitude variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "longitude",
    x_label = "Longitude (Decimal degrees)"
  )

Figure C.7: Histogram of the `longitude` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "longitude")

Figure C.8: Box plot of the `longitude` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "ghi_month",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.5: Statistics for the ghi_month variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "ghi_month",
    x_label = "Monthly average global horizontal irradiance (Wh/m²)"
  )

Figure C.9: Histogram of the `ghi_month` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "ghi_month")

Figure C.10: Box plot of the `ghi_month` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "ghi_annual",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.6: Statistics for the ghi_annual variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "ghi_annual",
    x_label = "Annual average global horizontal irradiance (Wh/m²)"
  )

Figure C.11: Histogram of the `ghi_annual` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "ghi_annual")

Figure C.12: Box plot of the `ghi_annual` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "march_equinox_sunrise",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.7: Statistics for the march_equinox_sunrise variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "march_equinox_sunrise",
    x_label = "Sunrise on the March equinox (Seconds)"
  )

Figure C.13: Histogram of the `march_equinox_sunrise` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "march_equinox_sunrise")

Figure C.14: Box plot of the `march_equinox_sunrise` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "march_equinox_sunset",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.8: Statistics for the march_equinox_sunset variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "march_equinox_sunset",
    x_label = "Sunset on the March equinox (Seconds)"
  )

Figure C.15: Histogram of the `march_equinox_sunset` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "march_equinox_sunset")

Figure C.16: Box plot of the `march_equinox_sunset` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "march_equinox_daylight",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.9: Statistics for the march_equinox_daylight variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "march_equinox_daylight",
    x_label = "Daylight on the March equinox (Seconds)"
  )

Figure C.17: Histogram of the `march_equinox_daylight` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "march_equinox_daylight")

Figure C.18: Box plot of the `march_equinox_daylight` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "june_solstice_sunrise",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.10: Statistics for the june_solstice_sunrise variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "june_solstice_sunrise",
    x_label = "Sunrise on the June solstice (Seconds)"
  )

Figure C.19: Histogram of the `june_solstice_sunrise` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "june_solstice_sunrise")

Figure C.20: Box plot of the `june_solstice_sunrise` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "june_solstice_sunset",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.11: Statistics for the june_solstice_sunset variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "june_solstice_sunset",
    x_label = "Sunset on the June solstice (Seconds)"
  )

Figure C.21: Histogram of the `june_solstice_sunset` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "june_solstice_sunset")

Figure C.22: Box plot of the `june_solstice_sunset` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "june_solstice_daylight",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.12: Statistics for the june_solstice_daylight variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "june_solstice_daylight",
    x_label = "Daylight on the June solstice (Seconds)"
  )

Figure C.23: Histogram of the `june_solstice_daylight` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "june_solstice_daylight")

Figure C.24: Box plot of the `june_solstice_daylight` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "september_equinox_sunrise",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.13: Statistics for the september_equinox_sunrise variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "september_equinox_sunrise",
    x_label = "Sunrise on the September solstice (Seconds)"
  )

Figure C.25: Histogram of the `september_equinox_sunrise` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "september_equinox_sunrise")

Figure C.26: Box plot of the `september_equinox_sunrise` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "september_equinox_sunset",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.14: Statistics for the september_equinox_sunset variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "september_equinox_sunset",
    x_label = "Sunset on the September solstice (Seconds)"
  )

Figure C.27: Histogram of the `september_equinox_sunset` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "september_equinox_sunset")

Figure C.28: Box plot of the `september_equinox_sunset` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "september_equinox_daylight",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.15: Statistics for the september_equinox_daylight variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "september_equinox_daylight",
    x_label = "Daylight on the September solstice (Seconds)"
  )

Figure C.29: Histogram of the `september_equinox_daylight` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "september_equinox_daylight")

Figure C.30: Box plot of the `september_equinox_daylight` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "december_solstice_sunrise",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.16: Statistics for the december_solstice_sunrise variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "december_solstice_sunrise",
    x_label = "Sunrise on the December solstice (Seconds)"
  )

Figure C.31: Histogram of the `december_solstice_sunrise` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "december_solstice_sunrise")

Figure C.32: Box plot of the `december_solstice_sunrise` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "december_solstice_sunset",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.17: Statistics for the december_solstice_sunset variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "december_solstice_sunset",
    x_label = "Sunset on the December solstice (Seconds)"
  )

Figure C.33: Histogram of the `december_solstice_sunset` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "december_solstice_sunset")

Figure C.34: Box plot of the `december_solstice_sunset` variable.

Code

weighted_data |>
  rutils:::stats_summary(
    col = "december_solstice_daylight",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )

Table C.18: Statistics for the december_solstice_daylight variable.

Source: Created by the author.

Code

weighted_data |>
  plotr:::plot_dist(
    col = "december_solstice_daylight",
    x_label = "Daylight on the December solstice (Seconds)"
  )

Figure C.35: Histogram of the `december_solstice_daylight` variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Code

weighted_data |> plotr:::plot_box_plot(col = "december_solstice_daylight")

Figure C.36: Box plot of the `december_solstice_daylight` variable.

C.5 Correlation Matrix of Main Variables

C.5.1 Full Sample

Code

anonymized_data |>
  plotr:::plot_ggally(
    cols = c("sex", "age", "latitude", "longitude", "msf_sc"),
    mapping = ggplot2::aes(colour = sex)
  ) |>
  rutils::shush()

Figure C.37: Correlation matrix of main variables (**Full sample**).

C.5.2 Analysis Sample

Code

weighted_data |>
  plotr:::plot_ggally(
    cols = c("sex", "age", "latitude", "longitude", "msf_sc"),
    mapping = ggplot2::aes(colour = sex)
  ) |>
  rutils::shush()

Figure C.38: Correlation matrix of the main variables (**Analysis sample**).

C.6 Latitudinal and Longitudinal Ranges

C.6.0.1 Brazil

Click here to learn more about Brazil’s extreme points.

Code

box <-
  geobr::read_country(2017, showProgress = FALSE) |>
  rutils::shush() |>
  dplyr::pull(geom) |>
  sf::st_bbox() |>
  as.list()

brazil_lat_lon <- dplyr::tibble(
    name = c("min", "max", "range"),
    latitude = c(
      box$ymin,
      box$ymax,
      box$ymax - box$ymin
    ),
    longitude = c(
      box$xmin,
      box$xmax,
      box$xmax - box$xmin
    )
)

brazil_lat_lon

Table C.19: Brazil’s extreme points.

Source: Brazilian Institute of Geography and Statistics (IBGE), via the shapefiles provided by the geobr R package.

C.6.0.2 Full Sample

Code

box <-
  anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::summarise(
    xmin = min(longitude, na.rm = TRUE),
    xmax = max(longitude, na.rm = TRUE),
    xrange = xmax - xmin,
    ymin = min(latitude, na.rm = TRUE),
    ymax = max(latitude, na.rm = TRUE),
    yrange = ymax - ymin
  ) |>
  as.list()

full_sample_lat_lon <- dplyr::tibble(
    name = c("min", "max", "range"),
    latitude = c(
      box$ymin,
      box$ymax,
      box$ymax - box$ymin
    ),
    longitude = c(
      box$xmin,
      box$xmax,
      box$xmax - box$xmin
    )
)

full_sample_lat_lon

Table C.20: Latitude and longitude statistics of respondents (Full sample).

Source: Created by the author.

C.6.0.3 Analysis Sample

Code

box <-
  weighted_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::summarise(
    xmin = min(longitude, na.rm = TRUE),
    xmax = max(longitude, na.rm = TRUE),
    xrange = xmax - xmin,
    ymin = min(latitude, na.rm = TRUE),
    ymax = max(latitude, na.rm = TRUE),
    yrange = ymax - ymin
  ) |>
  as.list()

analysis_sample_lat_lon <- dplyr::tibble(
    name = c("min", "max", "range"),
    latitude = c(
      box$ymin,
      box$ymax,
      box$ymax - box$ymin
    ),
    longitude = c(
      box$xmin,
      box$xmax,
      box$xmax - box$xmin
    )
)

analysis_sample_lat_lon

Table C.21: Latitude and longitude statistics of respondents (Analysis sample).

Source: Created by the author.

C.7 Population Distributions

C.7.1 Brazil

The population distribution estimates are derived from the Brazilian Institute of Geography and Statistics (IBGE) for the year 2017, aligning with the sample. These estimates are accessible via IBGE’s Automatic Retrieval System (SIDRA) platform.

Source: Instituto Brasileiro de Geografia e Estatística. (n.d.). Tabela 6579: População residente estimada [Table 6579: Estimated resident population] [Dataset]. SIDRA. https://sidra.ibge.gov.br/Tabela/6579

Code

ibge_6579_data_state <-
  sidrar::get_sidra(api ="/t/6579/n3/all/v/all/p/2017") |>
  rutils::shush() |>
  dplyr::as_tibble() |>
  janitor::clean_names() |>
  dplyr::select(unidade_da_federacao_codigo, valor) |>
  dplyr::rename(
    state_code = unidade_da_federacao_codigo,
    n = valor
  ) |>
  dplyr::mutate(state_code = as.integer(state_code)) |>
  dplyr::relocate(state_code, n)

Code

plot_6579_ibge_1 <-
  ibge_6579_data_state |>
  plotr:::plot_brazil_state(
    col_fill = "n",
    year = 2017,
    transform = "log10",
    direction = -1,
    scale_type = "binned"
  )

Code

ibge_6579_data_municipality <-
  sidrar::get_sidra(api ="/t/6579/n6/all/v/all/p/2017") |>
  rutils::shush() |>
  dplyr::as_tibble() |>
  janitor::clean_names() |>
  dplyr::select(municipio_codigo, valor) |>
  dplyr::rename(
    municipality_code = municipio_codigo,
    n = valor
  ) |>
  dplyr::mutate(municipality_code = as.integer(municipality_code)) |>
  dplyr::relocate(municipality_code, n)

Code

max_limit <-
  ibge_6579_data_municipality |>
  dplyr::pull(n) |>
  rutils:::inverse_log_max(10)

plot_6579_ibge_2 <-
  ibge_6579_data_municipality |>
  plotr:::plot_brazil_municipality(
    col_fill = "n",
    year = 2017,
    transform = "log10",
    direction = -1,
    breaks = 10^(seq(1, log10(max_limit) - 1)),
    reverse = FALSE
  )

Code

plot_6579_ibge_3 <-
  ibge_6579_data_municipality |>
  plotr:::plot_brazil_municipality(
    col_fill = "n",
    year = 2017,
    transform = "log10",
    direction = -1,
    alpha = 0.75,
    breaks = c(100000, 500000, 1000000, 5000000, 10000000, 12000000),
    point = TRUE
  )

Code

plot_ibge_panel <-
  patchwork::wrap_plots(
    plot_6579_ibge_1 |> plotr:::rm_ggspatial_scale(),
    plot_6579_ibge_2 |> plotr:::rm_ggspatial_scale(),
    plot_6579_ibge_3 |> plotr:::rm_ggspatial_scale(),
    ncol = 2,
    nrow = 2,
    widths = c(1, 1),
    heights = c(1, 1)
  ) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

plot_ibge_panel

C.7.2 Full Sample

Code

plot_full_1 <-
  anonymized_data |>
  plotr:::plot_world_countries(
    transform = "log10",
    direction = -1,
    scale_type = "binned"
  )

Code

plot_full_2 <-
  anonymized_data |>
  plotr:::plot_brazil_state(
    year = 2017,
    transform = "log10",
    direction = -1,
    scale_type = "binned"
  )

Code

max_limit <-
  anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::count(municipality_code) |>
  dplyr::pull(n) |>
  rutils:::inverse_log_max(10)

plot_full_3 <-
  anonymized_data |>
  plotr:::plot_brazil_municipality(
    year = 2017,
    transform = "log10",
    direction = -1,
    breaks = 10^(seq(1, log10(max_limit) - 1))
  )

Code

max_limit <-
  anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::count(municipality_code) |>
  dplyr::pull(n) |>
  max()

plot_full_4 <-
  anonymized_data |>
  plotr:::plot_brazil_municipality(
    year = 2017,
    transform = "log10",
    direction = -1,
    alpha = 0.75,
    breaks = c(10, 500, 1000, 5000, 10000, 12000),
    point = TRUE,
    reverse = TRUE
  )

Code

plot_full_5 <-
  anonymized_data |>
  plotr:::plot_brazil_point(
    year = 2017,
    scale_type = "discrete"
  )

Code

patchwork::wrap_plots(
   plot_full_2 |> plotr:::rm_ggspatial_scale(),
   plot_full_3 |> plotr:::rm_ggspatial_scale(),
   plot_full_4 |> plotr:::rm_ggspatial_scale(),
   plot_full_5 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 2,
   widths = c(1, 1),
   heights = c(1, 1)
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

C.7.3 Analysis Sample

Code

plot_analysis_1 <-
  weighted_data |>
  plotr:::plot_brazil_state(
    year = 2017,
    transform = "log10",
    direction = -1,
    scale_type = "binned"
  )

Code

max_limit <-
  weighted_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::count(municipality_code) |>
  dplyr::pull(n) |>
  rutils:::inverse_log_max(10)

plot_analysis_2 <-
  weighted_data |>
  plotr:::plot_brazil_municipality(
    year = 2017,
    transform = "log10",
    direction = -1,
    breaks = 10^(seq(1, log10(max_limit)))
  )

Code

plot_analysis_3 <-
  weighted_data |>
  plotr:::plot_brazil_municipality(
    year = 2017,
    transform = "log10",
    direction = -1,
    alpha = 0.75,
    breaks = c(10, 500, 1000, 5000, 7500),
    point = TRUE,
    reverse = TRUE
  )

Code

plot_analysis_4 <-
  weighted_data |>
  plotr:::plot_brazil_point(
    year = 2017,
    scale_type = "discrete"
  )

Code

patchwork::wrap_plots(
   plot_analysis_1 |> plotr:::rm_ggspatial_scale(),
   plot_analysis_2 |> plotr:::rm_ggspatial_scale(),
   plot_analysis_3 |> plotr:::rm_ggspatial_scale(),
   plot_analysis_4 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 2
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

C.7.4 Brazil versus Full Sample

Code

patchwork::wrap_plots(
   plot_6579_ibge_1 |> plotr:::rm_ggspatial_scale(),
   plot_full_2 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 1
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

Code

patchwork::wrap_plots(
   plot_6579_ibge_2 |> plotr:::rm_ggspatial_scale(),
   plot_full_3 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 1
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

Code

patchwork::wrap_plots(
   plot_6579_ibge_3 |> plotr:::rm_ggspatial_scale(),
   plot_full_4 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 1
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

C.8 Age Distributions

C.8.1 Brazil

Source: Instituto Brasileiro de Geografia e Estatística. (n.d.). Tabela 6407: População residente, por sexo e grupos de idade [Table 6407: Resident population, by sex and age groups] [Dataset]. SIDRA. https://sidra.ibge.gov.br/tabela/6407

Code

prettycheck:::assert_internet()

ibge_6407_data <-
  sidrar::get_sidra(
    api = paste0(
      "/t/6407/n3/all/v/606/p/2017/c2/allxt/c58/1140,1141,1144,1145,1152,",
      "2793,3299,3300,3301,3350,6798,40291,118282"
    )
  ) |>
  dplyr::as_tibble() |>
  janitor::clean_names() |>
  dplyr::select(
    valor, unidade_da_federacao_codigo, unidade_da_federacao, ano, sexo,
    grupo_de_idade
  ) |>
  dplyr::rename(
    n = valor,
    state_code = unidade_da_federacao_codigo,
    state = unidade_da_federacao,
    year = ano,
    sex = sexo,
    age_group = grupo_de_idade
  ) |>
  dplyr::arrange(state, sex, age_group) |>
  dplyr::mutate(
    year = as.integer(year),
    country = "Brazil",
    region = orbis::get_brazil_region(state),
    state_code = as.integer(state_code),
    sex = dplyr::case_match(
      sex,
      "Homens" ~ "Male",
      "Mulheres" ~ "Female"
    ),
    sex = factor(sex, ordered = FALSE),
    age_group = dplyr::case_match(
      age_group,
      "0 a 4 anos" ~ "0-4",
      "5 a 9 anos" ~ "5-9",
      "10 a 13 anos" ~ "10-13",
      "14 a 15 anos" ~ "14-15",
      "16 a 17 anos" ~ "16-17",
      "18 a 19 anos" ~ "18-19",
      "20 a 24 anos" ~ "20-24",
      "25 a 29 anos" ~ "25-29",
      "30 a 39 anos" ~ "30-39",
      "40 a 49 anos" ~ "40-49",
      "50 a 59 anos" ~ "50-59",
      "60 a 64 anos" ~ "60-64",
      "65 anos ou mais" ~ "65+"
    ),
    age_group = factor(age_group, ordered = TRUE),
    age_group_midpoint = dplyr::case_when(
      age_group == "0-4" ~ 2,
      age_group == "5-9" ~ 7,
      age_group == "10-13" ~ 11.5,
      age_group == "14-15" ~ 14.5,
      age_group == "16-17" ~ 16.5,
      age_group == "18-19" ~ 18.5,
      age_group == "20-24" ~ 22,
      age_group == "25-29" ~ 27,
      age_group == "30-39" ~ 34.5,
      age_group == "40-49" ~ 44.5,
      age_group == "50-59" ~ 54.5,
      age_group == "60-64" ~ 62,
      age_group == "65+" ~ 65 + 62 - 54.5 # 65 + 62 - 54.5
    ),
    n = as.integer(n * 1000)
  ) |>
  dplyr::relocate(
    year, country, region, state_code, state, sex, age_group,
    age_group_midpoint, n
  )

ibge_6407_data

The statistics presented in this section are estimates based on the midpoints of age groups and should be interpreted with caution. The variable \(n\) is expressed in thousands of individuals.

Code

ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  dplyr::select(age, n) |>
  tidyr::uncount(n) |>
  rutils:::stats_summary("age")

Code

ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  rutils:::summarize_by("age", "sex", "n")

Code

ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  rutils:::summarize_by("age", "region", "n")

Code

ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  rutils:::summarize_by("age", "state", "n")

Code

plot_ibge_6407_age_1 <-
  ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  dplyr::select(sex, age, n) |>
  tidyr::uncount(n) |>
  plotr:::plot_age_pyramid(
    breaks = c(0, 10, 20, 30, 40, 50, 60, 65, 90)
  )

Code

plot_ibge_6407_age_2 <-
  ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  dplyr::select(state_code, age, n) |>
  tidyr::uncount(n) |>
  plotr:::plot_brazil_state(
    col_fill = "age",
    year = 2017,
    transform = "identity",
    direction = -1,
    quiet = TRUE,
    scale_type = "binned"
  )

C.8.2 Full Sample

Code

anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::stats_summary("age")

Code

anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::summarize_by("age", "sex")

Code

anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::summarize_by("age", "region")

Code

anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::summarize_by("age", "state")

Code

plot_full_age_1<-
  anonymized_data |>
  plotr:::plot_age_pyramid()

Code

plot_full_age_2 <-
  anonymized_data |>
  plotr:::plot_brazil_state(
    col_fill = "age",
    year = 2017,
    transform = "identity",
    direction = -1,
    quiet = TRUE,
    scale_type = "binned"
  )

Code

plot_full_age_3 <-
  anonymized_data |>
  plotr:::plot_brazil_municipality(
    col_fill = "age", # Means
    year = 2017,
    direction = -1,
    quiet = TRUE
  )

C.8.3 Analysis Sample

Code

weighted_data |> rutils:::stats_summary("age")

Code

weighted_data |> rutils:::summarize_by("age", "sex")

Code

weighted_data |> rutils:::summarize_by("age", "region")

Code

weighted_data |> rutils:::summarize_by("age", "state")

Code

plot_analysis_age_1 <-
  weighted_data |>
  plotr:::plot_age_pyramid()

Code

plot_analysis_age_2 <-
  anonymized_data |>
  plotr:::plot_brazil_state(
    col_fill = "age", # Means
    year = 2017,
    direction = -1,
    quiet = TRUE,
    scale_type = "binned"
  )

Code

plot_analysis_age_3 <-
  weighted_data |>
  plotr:::plot_brazil_municipality(
    col_fill = "age", # Means
    year = 2017,
    direction = -1,
    quiet = TRUE
  )

C.9 Weight Distributions

C.9.1 Full Sample

Code

weighted_data |>
  dplyr::filter(!rutils::test_outlier(weight)) |>
  plotr:::plot_latitude_series(
    col = "weight",
    y_label = "Weight (kg)"
  )

Figure C.39: Boxplots of mean weight values (kg) aggregated by 1° latitude intervals, illustrating the relationship between latitude and weight. The × symbol points to the mean. The orange line represents a linear regression.

Code

plot_age_sex_weigth_series <-
  anonymized_data |>
  dplyr::filter(age <= 50) |>
  plotr:::plot_series(
    col_y = "weight",
    y_label = "Weight (kg)"
  )

Figure C.40: Relation between age and weight (kg), divided by sex and aggregated by the mean. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

C.10 Chronotype Distributions

C.10.1 Full Sample

Code

anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::stats_summary("msf_sc", as_list = TRUE) |>
  dplyr::as_tibble() |>
  dplyr::mutate(
    dplyr::across(
      .cols = dplyr::where(hms::is_hms),
      .fns = lubritime::round_time
    ),
    dplyr::across(
      .cols = dplyr::everything(),
      .fns = as.character
    )
  ) |>
  tidyr::pivot_longer(cols = dplyr::everything())

Code

anonymized_data |> plotr:::get_msf_sc_cutoffs()

Code

plot_age_sex_series <-
  anonymized_data |>
  dplyr::filter(age <= 50) |>
  plotr:::plot_series()

Figure C.41: Observed relation between age and chronotype, divided by sex and aggregated by the mean. Chronotype is represented by the local time of the sleep corrected midpoint between sleep onset and sleep end on work-free days (MSF_sc), MCTQ proxy for measuring the chronotype. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

Figure C.42: Distribution of European chronotypes by age, as shown in Roenneberg et al. (2007), for comparison.

Code

plot_age_sex_series <-
  anonymized_data |>
  dplyr::filter(!rutils::test_outlier(weight)) |>
  plotr:::plot_series(
    col_x = "weight",
    x_label = "Weigth",
    date_breaks = "30 min"
  )

Figure C.43: Observed relation between weight and chronotype, divided by sex and aggregated by the mean. Chronotype is represented by the local time of the sleep corrected midpoint between sleep onset and sleep end on work-free days (MSF_sc), MCTQ proxy for measuring the chronotype. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

C.10.2 Analysis Sample

Code

weighted_data |>
  rutils:::stats_summary("msf_sc", as_list = TRUE) |>
  dplyr::as_tibble() |>
  dplyr::mutate(
    dplyr::across(
      .cols = dplyr::where(hms::is_hms),
      .fns = lubritime::round_time
    ),
    dplyr::across(
      .cols = dplyr::everything(),
      .fns = as.character
    )
  ) |>
  tidyr::pivot_longer(cols = dplyr::everything())

Code

weighted_data |> plotr:::get_msf_sc_cutoffs()

Code

weighted_data |>
  dplyr::mutate(
    msf_sc_category = plotr:::categorize_msf_sc(msf_sc),
    msf_sc_category = factor(
      msf_sc_category,
      levels = c(
        "Extremely early", "Moderately early", "Slightly early",
        "Intermediate", "Slightly late", "Moderately late",
        "Extremely late"
      ),
      ordered = TRUE
    )
  ) |>
  rutils:::summarize_by("msf_sc", "msf_sc_category")

Code

weighted_data |> plotr:::plot_chronotype()

Figure C.44: Observed distribution of the local time of the sleep-corrected midpoint between sleep onset and sleep end on work-free days (MSF_sc), a proxy for chronotype.
Chronotypes are categorized into quantiles, ranging from extremely early (\(0 |- 0.111\)) to extremely late (\(0.888 |- 1\)).

Figure C.45: Distribution of European chronotypes, as shown in Roenneberg et al. (2019) (for comparison).

Code

weighted_data |> rutils:::summarize_by("msf_sc", "sex")

Code

weighted_data |>
  dplyr::mutate(
    age_group = dplyr::case_when(
      dplyr::between(age, 0, 4) ~ "0-4",
      dplyr::between(age, 5, 9) ~ "5-9",
      dplyr::between(age, 10, 13) ~ "10-13",
      dplyr::between(age, 14, 15) ~ "14-15",
      dplyr::between(age, 16, 17) ~ "16-17",
      dplyr::between(age, 18, 19) ~ "18-19",
      dplyr::between(age, 20, 24) ~ "20-24",
      dplyr::between(age, 25, 29) ~ "25-29",
      dplyr::between(age, 30, 39) ~ "30-39",
      dplyr::between(age, 40, 49) ~ "40-49",
      dplyr::between(age, 50, 59) ~ "50-59",
      dplyr::between(age, 60, 64) ~ "60-64",
      age >= 65 ~ "65+"
    )
  ) |>
  rutils:::summarize_by("msf_sc", "age_group")

Code

plot_age_sex_series <-
  weighted_data |>
  dplyr::filter(age <= 50) |>
  plotr:::plot_series()

Figure C.46: Observed relation between age and chronotype, divided by sex and aggregated by the mean. Chronotype is represented by the local time of the sleep corrected midpoint between sleep onset and sleep end on work-free days (MSF_sc), MCTQ proxy for measuring the chronotype. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

Code

plot_age_sex_series <-
  weighted_data |>
  dplyr::filter(!rutils::test_outlier(weight), weight > 45) |>
  plotr:::plot_series(
    col_x = "weight",
    x_label = "Weigth",
    date_breaks = "30 min"
  )

Figure C.47: Observed relation between weight and chronotype, divided by sex and aggregated by the mean. Chronotype is represented by the local time of the sleep corrected midpoint between sleep onset and sleep end on work-free days (MSF_sc), MCTQ proxy for measuring the chronotype. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

Code

weighted_data |> rutils:::summarize_by("msf_sc", "region")

Code

weighted_data |> rutils:::summarize_by("msf_sc", "state")

Code

limits <- # Interquartile range (IQR): Q3 - Q1
  c(
    weighted_data |>
      dplyr::pull(msf_sc) |>
      lubritime::link_to_timeline() |>
      as.numeric() |>
      stats::quantile(0.25, na.rm = TRUE),
    weighted_data |>
      dplyr::pull(msf_sc) |>
      lubritime::link_to_timeline() |>
      as.numeric() |>
      stats::quantile(0.75, na.rm = TRUE)
  )

weighted_data |>
  dplyr::mutate(
    msf_sc =
      msf_sc |>
      lubritime::link_to_timeline() |>
      as.numeric()
  ) |>
  plotr:::plot_brazil_state(
    col_fill = "msf_sc",
    year = 2017,
    breaks =
      seq(limits[1], limits[2], length.out = 6) |>
      groomr::remove_caps(),
    labels = plotr:::format_as_hm,
    limits = limits, # !!!
    quiet = TRUE
  )

Figure C.48: Observed geographical distribution of MSF_sc values by Brazilian state, illustrating how chronotype varies with latitude in Brazil.
MSF_sc is a proxy for chronotype, representing the midpoint of sleep on work-free days, adjusted for sleep debt. Higher MSF_sc values indicate a tendency towards eveningness. The color scale is bounded by the first and third quartiles. Differences in mean MSF_sc values across states are small and fall within a narrow range relative to the scale of the Munich ChronoType Questionnaire (MCTQ), limiting the significance of these variations.

Code

limits <- # Interquartile range (IQR): Q3 - Q1
  c(
    weighted_data |>
      dplyr::pull(msf_sc) |>
      lubritime::link_to_timeline() |>
      as.numeric() |>
      quantile(0.25, na.rm = TRUE),
    weighted_data |>
      dplyr::pull(msf_sc) |>
      lubritime::link_to_timeline() |>
      as.numeric() |>
      quantile(0.75, na.rm = TRUE)
  )

weighted_data |>
  dplyr::mutate(
    msf_sc =
      msf_sc |>
      lubritime::link_to_timeline() |>
      as.numeric()
  ) |>
  plotr:::plot_brazil_municipality(
    col_fill = "msf_sc",
    year = 2017,
    breaks =
      seq(limits[1], limits[2], length.out = 6) |>
      groomr::remove_caps(),
    labels = plotr:::format_as_hm,
    limits = limits,
    quiet = TRUE,
    reverse = TRUE
  )

Code

plot <-
  weighted_data |>
  dplyr::mutate(
    msf_sc_category = plotr:::categorize_msf_sc(msf_sc),
    msf_sc_category = factor(
      msf_sc_category,
      levels = c(
        "Extremely early", "Moderately early", "Slightly early",
        "Intermediate", "Slightly late", "Moderately late",
        "Extremely late"
      ),
      ordered = TRUE
    )
  ) |>
  plotr:::plot_brazil_point(
    col_group = "msf_sc_category",
    year = 2017,
    scale_type = "discrete",
    print = FALSE
  ) +
  ggplot2::labs(color = NULL)

plot |> print() |> rutils::shush()

Code

plot <-
  weighted_data |>
  dplyr::mutate(
    msf_sc_category = plotr:::categorize_msf_sc(msf_sc),
    msf_sc_category = factor(
      msf_sc_category,
      levels = c(
        "Extremely early", "Moderately early", "Slightly early",
        "Intermediate", "Slightly late", "Moderately late",
        "Extremely late"
      ),
      ordered = TRUE
    )
  ) |>
  plotr:::plot_brazil_point(
    col_group = "msf_sc_category",
    year = 2017,
    size = 0.1,
    alpha = 1,
    print = FALSE,
    scale_type = "discrete"
  ) +
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    panel.grid.major = ggplot2::element_blank(),
    panel.grid.minor = ggplot2::element_blank(),
    legend.position = "none"
  )

plot |>
  plotr:::rm_ggspatial_scale() +
  ggplot2::facet_wrap(~msf_sc_category, ncol = 4, nrow = 2)

Figure C.49: Observed geographical distribution of MSF_sc values by a spectrum of extremely early and extremely late chronotypes, illustrating how chronotype varies with latitude in Brazil.
MSF_sc is a proxy for chronotype, representing the midpoint of sleep on work-free days, adjusted for sleep debt. Chronotypes are categorized into quantiles, ranging from extremely early (\(0 |- 0.111\)) to extremely late (\(0.888 |- 1\)). No discernible pattern emerges from the distribution of chronotypes across latitudes.

Code

weighted_data |> plotr:::plot_latitude_series()

Figure C.50: Boxplots of observed mean MSF_sc values aggregated by \(1°\) latitude intervals, illustrating the relationship between latitude and chronotype.
MSF_sc is a proxy for chronotype, representing the midpoint of sleep on work-free days, adjusted for sleep debt. Higher MSF_sc values indicate a tendency towards eveningness. The × symbol points to the mean. The orange line represents a linear regression. The differences in mean/median values across latitudes are minimal relative to the Munich ChronoType Questionnaire (MCTQ) scale.

Code

weighted_data |>
  plotr:::plot_series(
    col_x = "latitude",
    x_label = "Latitude",
    date_breaks = "15 min",
    reverse = TRUE,
    change_sign = TRUE
  )