Appendix C — Exploratory Data Analysis

C.1 Overview

This document offers a detailed exploration of the survey data, highlighting key patterns and insights derived from the analysis.

It emphasizes the analysis sample, with explicit notes whenever the full sample is referenced. The analysis sample is a subset of the full sample, consisting of Brazilian individuals aged 18 or older, residing in the UTC-3 timezone, who completed the survey between October 15th and 21st, 2017.

Models were developed using cell weights to address sample imbalances. For details on the sample balancing procedure, refer to Supplementary Material D.

C.2 Setting the Environment

Code
library(cli)
library(dplyr)
library(fBasics)
library(geobr)
library(ggplot2)
library(here)
library(hms)
library(janitor)
library(lubridate)
library(lubritime) # github.com/danielvartan/lubritime
library(magrittr)
library(moments)
library(nortest)
library(orbis) # github.com/danielvartan/orbis
library(patchwork)
library(prettycheck) # github.com/danielvartan/prettycheck
library(purrr)
library(quartor) # github.com/danielvartan/quartor
library(rlang)
library(rutils) # github.com/danielvartan/rutils
library(sidrar)
library(stats)
library(stringr)
library(targets)
library(tidyr)
library(tseries)

C.3 Loading the Data

targets::tar_make(script = here::here("_targets.R"))

C.3.1 Full Sample

anonymized_data <-
  targets::tar_read("anonymized_data", store = here::here("_targets"))

C.3.2 Analysis Sample

weighted_data <-
  targets::tar_read("weighted_data", store = here::here("_targets"))

C.4 Distribution of Main Variables

Code
weighted_data |>
  rutils:::stats_summary(
    col = "msf_sc",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.1: Statistics for the msf_sc variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "msf_sc",
    x_label = "MSF~sc~ (Chronotype proxy) (Local time)"
  )
Figure C.1: Histogram of the msf_sc variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "msf_sc")
Figure C.2: Box plot of the msf_sc variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "age",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.2: Statistics for the age variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "age",
    x_label = "Age (years)"
  )
Figure C.3: Histogram of the age variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "age")
Figure C.4: Box plot of the age variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "latitude",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.3: Statistics for the latitude variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "latitude",
    x_label = "Latitude (Decimal degrees)"
  )
Figure C.5: Histogram of the latitude variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "latitude")
Figure C.6: Box plot of the latitude variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "longitude",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.4: Statistics for the longitude variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "longitude",
    x_label = "Longitude (Decimal degrees)"
  )
Figure C.7: Histogram of the longitude variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "longitude")
Figure C.8: Box plot of the longitude variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "ghi_month",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.5: Statistics for the ghi_month variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "ghi_month",
    x_label = "Monthly average global horizontal irradiance (Wh/m²)"
  )
Figure C.9: Histogram of the ghi_month variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "ghi_month")
Figure C.10: Box plot of the ghi_month variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "ghi_annual",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.6: Statistics for the ghi_annual variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "ghi_annual",
    x_label = "Annual average global horizontal irradiance (Wh/m²)"
  )
Figure C.11: Histogram of the ghi_annual variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "ghi_annual")
Figure C.12: Box plot of the ghi_annual variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "march_equinox_sunrise",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.7: Statistics for the march_equinox_sunrise variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "march_equinox_sunrise",
    x_label = "Sunrise on the March equinox (Seconds)"
  )
Figure C.13: Histogram of the march_equinox_sunrise variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "march_equinox_sunrise")
Figure C.14: Box plot of the march_equinox_sunrise variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "march_equinox_sunset",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.8: Statistics for the march_equinox_sunset variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "march_equinox_sunset",
    x_label = "Sunset on the March equinox (Seconds)"
  )
Figure C.15: Histogram of the march_equinox_sunset variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "march_equinox_sunset")
Figure C.16: Box plot of the march_equinox_sunset variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "march_equinox_daylight",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.9: Statistics for the march_equinox_daylight variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "march_equinox_daylight",
    x_label = "Daylight on the March equinox (Seconds)"
  )
Figure C.17: Histogram of the march_equinox_daylight variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "march_equinox_daylight")
Figure C.18: Box plot of the march_equinox_daylight variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "june_solstice_sunrise",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.10: Statistics for the june_solstice_sunrise variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "june_solstice_sunrise",
    x_label = "Sunrise on the June solstice (Seconds)"
  )
Figure C.19: Histogram of the june_solstice_sunrise variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "june_solstice_sunrise")
Figure C.20: Box plot of the june_solstice_sunrise variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "june_solstice_sunset",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.11: Statistics for the june_solstice_sunset variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "june_solstice_sunset",
    x_label = "Sunset on the June solstice (Seconds)"
  )
Figure C.21: Histogram of the june_solstice_sunset variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "june_solstice_sunset")
Figure C.22: Box plot of the june_solstice_sunset variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "june_solstice_daylight",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.12: Statistics for the june_solstice_daylight variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "june_solstice_daylight",
    x_label = "Daylight on the June solstice (Seconds)"
  )
Figure C.23: Histogram of the june_solstice_daylight variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "june_solstice_daylight")
Figure C.24: Box plot of the june_solstice_daylight variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "september_equinox_sunrise",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.13: Statistics for the september_equinox_sunrise variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "september_equinox_sunrise",
    x_label = "Sunrise on the September solstice (Seconds)"
  )
Figure C.25: Histogram of the september_equinox_sunrise variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "september_equinox_sunrise")
Figure C.26: Box plot of the september_equinox_sunrise variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "september_equinox_sunset",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.14: Statistics for the september_equinox_sunset variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "september_equinox_sunset",
    x_label = "Sunset on the September solstice (Seconds)"
  )
Figure C.27: Histogram of the september_equinox_sunset variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "september_equinox_sunset")
Figure C.28: Box plot of the september_equinox_sunset variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "september_equinox_daylight",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.15: Statistics for the september_equinox_daylight variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "september_equinox_daylight",
    x_label = "Daylight on the September solstice (Seconds)"
  )
Figure C.29: Histogram of the september_equinox_daylight variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "september_equinox_daylight")
Figure C.30: Box plot of the september_equinox_daylight variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "december_solstice_sunrise",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.16: Statistics for the december_solstice_sunrise variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "december_solstice_sunrise",
    x_label = "Sunrise on the December solstice (Seconds)"
  )
Figure C.31: Histogram of the december_solstice_sunrise variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "december_solstice_sunrise")
Figure C.32: Box plot of the december_solstice_sunrise variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "december_solstice_sunset",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.17: Statistics for the december_solstice_sunset variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "december_solstice_sunset",
    x_label = "Sunset on the December solstice (Seconds)"
  )
Figure C.33: Histogram of the december_solstice_sunset variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "december_solstice_sunset")
Figure C.34: Box plot of the december_solstice_sunset variable.

Source: Created by the author.

Code
weighted_data |>
  rutils:::stats_summary(
    col = "december_solstice_daylight",
    na_rm = TRUE,
    remove_outliers = FALSE,
    iqr_mult = 1.5,
    hms_format = TRUE,
    threshold = hms::parse_hms("12:00:00"),
    as_list = FALSE
  )
Table C.18: Statistics for the december_solstice_daylight variable.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_dist(
    col = "december_solstice_daylight",
    x_label = "Daylight on the December solstice (Seconds)"
  )
Figure C.35: Histogram of the december_solstice_daylight variable with a kernel density estimate, along with a quantile-quantile (Q-Q) plot between the variable and the theoretical quantiles of the normal distribution.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_box_plot(col = "december_solstice_daylight")
Figure C.36: Box plot of the december_solstice_daylight variable.

Source: Created by the author.

C.5 Correlation Matrix of Main Variables

C.5.1 Full Sample

Code
anonymized_data |>
  plotr:::plot_ggally(
    cols = c("sex", "age", "latitude", "longitude", "msf_sc"),
    mapping = ggplot2::aes(colour = sex)
  ) |>
  rutils::shush()
Figure C.37: Correlation matrix of main variables (Full sample).

Source: Created by the author.

C.5.2 Analysis Sample

Code
weighted_data |>
  plotr:::plot_ggally(
    cols = c("sex", "age", "latitude", "longitude", "msf_sc"),
    mapping = ggplot2::aes(colour = sex)
  ) |>
  rutils::shush()
Figure C.38: Correlation matrix of the main variables (Analysis sample).

Source: Created by the author.

C.6 Latitudinal and Longitudinal Ranges

C.6.0.1 Brazil

Click here to learn more about Brazil’s extreme points.

Code
box <-
  geobr::read_country(2017, showProgress = FALSE) |>
  rutils::shush() |>
  dplyr::pull(geom) |>
  sf::st_bbox() |>
  as.list()

brazil_lat_lon <- dplyr::tibble(
    name = c("min", "max", "range"),
    latitude = c(
      box$ymin,
      box$ymax,
      box$ymax - box$ymin
    ),
    longitude = c(
      box$xmin,
      box$xmax,
      box$xmax - box$xmin
    )
)

brazil_lat_lon
Table C.19: Brazil’s extreme points.

Source: Brazilian Institute of Geography and Statistics (IBGE), via the shapefiles provided by the geobr R package.

C.6.0.2 Full Sample

Code
box <-
  anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::summarise(
    xmin = min(longitude, na.rm = TRUE),
    xmax = max(longitude, na.rm = TRUE),
    xrange = xmax - xmin,
    ymin = min(latitude, na.rm = TRUE),
    ymax = max(latitude, na.rm = TRUE),
    yrange = ymax - ymin
  ) |>
  as.list()

full_sample_lat_lon <- dplyr::tibble(
    name = c("min", "max", "range"),
    latitude = c(
      box$ymin,
      box$ymax,
      box$ymax - box$ymin
    ),
    longitude = c(
      box$xmin,
      box$xmax,
      box$xmax - box$xmin
    )
)

full_sample_lat_lon
Table C.20: Latitude and longitude statistics of respondents (Full sample).

Source: Created by the author.

C.6.0.3 Analysis Sample

Code
box <-
  weighted_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::summarise(
    xmin = min(longitude, na.rm = TRUE),
    xmax = max(longitude, na.rm = TRUE),
    xrange = xmax - xmin,
    ymin = min(latitude, na.rm = TRUE),
    ymax = max(latitude, na.rm = TRUE),
    yrange = ymax - ymin
  ) |>
  as.list()

analysis_sample_lat_lon <- dplyr::tibble(
    name = c("min", "max", "range"),
    latitude = c(
      box$ymin,
      box$ymax,
      box$ymax - box$ymin
    ),
    longitude = c(
      box$xmin,
      box$xmax,
      box$xmax - box$xmin
    )
)

analysis_sample_lat_lon
Table C.21: Latitude and longitude statistics of respondents (Analysis sample).

Source: Created by the author.

C.7 Population Distributions

C.7.1 Brazil

The population distribution estimates are derived from the Brazilian Institute of Geography and Statistics (IBGE) for the year 2017, aligning with the sample. These estimates are accessible via IBGE’s Automatic Retrieval System (SIDRA) platform.

Source: Instituto Brasileiro de Geografia e Estatística. (n.d.). Tabela 6579: População residente estimada [Table 6579: Estimated resident population] [Dataset]. SIDRA. https://sidra.ibge.gov.br/Tabela/6579

Code
ibge_6579_data_state <-
  sidrar::get_sidra(api ="/t/6579/n3/all/v/all/p/2017") |>
  rutils::shush() |>
  dplyr::as_tibble() |>
  janitor::clean_names() |>
  dplyr::select(unidade_da_federacao_codigo, valor) |>
  dplyr::rename(
    state_code = unidade_da_federacao_codigo,
    n = valor
  ) |>
  dplyr::mutate(state_code = as.integer(state_code)) |>
  dplyr::relocate(state_code, n)
Code
plot_6579_ibge_1 <-
  ibge_6579_data_state |>
  plotr:::plot_brazil_state(
    col_fill = "n",
    year = 2017,
    transform = "log10",
    direction = -1,
    scale_type = "binned"
  )

Code
ibge_6579_data_municipality <-
  sidrar::get_sidra(api ="/t/6579/n6/all/v/all/p/2017") |>
  rutils::shush() |>
  dplyr::as_tibble() |>
  janitor::clean_names() |>
  dplyr::select(municipio_codigo, valor) |>
  dplyr::rename(
    municipality_code = municipio_codigo,
    n = valor
  ) |>
  dplyr::mutate(municipality_code = as.integer(municipality_code)) |>
  dplyr::relocate(municipality_code, n)
Code
max_limit <-
  ibge_6579_data_municipality |>
  dplyr::pull(n) |>
  rutils:::inverse_log_max(10)

plot_6579_ibge_2 <-
  ibge_6579_data_municipality |>
  plotr:::plot_brazil_municipality(
    col_fill = "n",
    year = 2017,
    transform = "log10",
    direction = -1,
    breaks = 10^(seq(1, log10(max_limit) - 1)),
    reverse = FALSE
  )

Code
plot_6579_ibge_3 <-
  ibge_6579_data_municipality |>
  plotr:::plot_brazil_municipality(
    col_fill = "n",
    year = 2017,
    transform = "log10",
    direction = -1,
    alpha = 0.75,
    breaks = c(100000, 500000, 1000000, 5000000, 10000000, 12000000),
    point = TRUE
  )

Code
plot_ibge_panel <-
  patchwork::wrap_plots(
    plot_6579_ibge_1 |> plotr:::rm_ggspatial_scale(),
    plot_6579_ibge_2 |> plotr:::rm_ggspatial_scale(),
    plot_6579_ibge_3 |> plotr:::rm_ggspatial_scale(),
    ncol = 2,
    nrow = 2,
    widths = c(1, 1),
    heights = c(1, 1)
  ) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

plot_ibge_panel

C.7.2 Full Sample

Code
plot_full_1 <-
  anonymized_data |>
  plotr:::plot_world_countries(
    transform = "log10",
    direction = -1,
    scale_type = "binned"
  )

Code
plot_full_2 <-
  anonymized_data |>
  plotr:::plot_brazil_state(
    year = 2017,
    transform = "log10",
    direction = -1,
    scale_type = "binned"
  )

Code
max_limit <-
  anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::count(municipality_code) |>
  dplyr::pull(n) |>
  rutils:::inverse_log_max(10)

plot_full_3 <-
  anonymized_data |>
  plotr:::plot_brazil_municipality(
    year = 2017,
    transform = "log10",
    direction = -1,
    breaks = 10^(seq(1, log10(max_limit) - 1))
  )

Code
max_limit <-
  anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::count(municipality_code) |>
  dplyr::pull(n) |>
  max()

plot_full_4 <-
  anonymized_data |>
  plotr:::plot_brazil_municipality(
    year = 2017,
    transform = "log10",
    direction = -1,
    alpha = 0.75,
    breaks = c(10, 500, 1000, 5000, 10000, 12000),
    point = TRUE,
    reverse = TRUE
  )

Code
plot_full_5 <-
  anonymized_data |>
  plotr:::plot_brazil_point(
    year = 2017,
    scale_type = "discrete"
  )

Code
patchwork::wrap_plots(
   plot_full_2 |> plotr:::rm_ggspatial_scale(),
   plot_full_3 |> plotr:::rm_ggspatial_scale(),
   plot_full_4 |> plotr:::rm_ggspatial_scale(),
   plot_full_5 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 2,
   widths = c(1, 1),
   heights = c(1, 1)
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

C.7.3 Analysis Sample

Code
plot_analysis_1 <-
  weighted_data |>
  plotr:::plot_brazil_state(
    year = 2017,
    transform = "log10",
    direction = -1,
    scale_type = "binned"
  )

Code
max_limit <-
  weighted_data |>
  dplyr::filter(country == "Brazil") |>
  dplyr::count(municipality_code) |>
  dplyr::pull(n) |>
  rutils:::inverse_log_max(10)

plot_analysis_2 <-
  weighted_data |>
  plotr:::plot_brazil_municipality(
    year = 2017,
    transform = "log10",
    direction = -1,
    breaks = 10^(seq(1, log10(max_limit)))
  )

Code
plot_analysis_3 <-
  weighted_data |>
  plotr:::plot_brazil_municipality(
    year = 2017,
    transform = "log10",
    direction = -1,
    alpha = 0.75,
    breaks = c(10, 500, 1000, 5000, 7500),
    point = TRUE,
    reverse = TRUE
  )

Code
plot_analysis_4 <-
  weighted_data |>
  plotr:::plot_brazil_point(
    year = 2017,
    scale_type = "discrete"
  )

Code
patchwork::wrap_plots(
   plot_analysis_1 |> plotr:::rm_ggspatial_scale(),
   plot_analysis_2 |> plotr:::rm_ggspatial_scale(),
   plot_analysis_3 |> plotr:::rm_ggspatial_scale(),
   plot_analysis_4 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 2
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

C.7.4 Brazil versus Full Sample

Code
patchwork::wrap_plots(
   plot_6579_ibge_1 |> plotr:::rm_ggspatial_scale(),
   plot_full_2 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 1
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

Code
patchwork::wrap_plots(
   plot_6579_ibge_2 |> plotr:::rm_ggspatial_scale(),
   plot_full_3 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 1
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

Code
patchwork::wrap_plots(
   plot_6579_ibge_3 |> plotr:::rm_ggspatial_scale(),
   plot_full_4 |> plotr:::rm_ggspatial_scale(),
   ncol = 2,
   nrow = 1
) +
  patchwork::plot_annotation(tag_levels = "A") &
  ggplot2::theme_void() &
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    legend.key.size = ggplot2::unit(0.5, "cm"),
    text = ggplot2::element_text(size = 9)
  )

C.8 Age Distributions

C.8.1 Brazil

Source: Instituto Brasileiro de Geografia e Estatística. (n.d.). Tabela 6407: População residente, por sexo e grupos de idade [Table 6407: Resident population, by sex and age groups] [Dataset]. SIDRA. https://sidra.ibge.gov.br/tabela/6407

Code
prettycheck:::assert_internet()

ibge_6407_data <-
  sidrar::get_sidra(
    api = paste0(
      "/t/6407/n3/all/v/606/p/2017/c2/allxt/c58/1140,1141,1144,1145,1152,",
      "2793,3299,3300,3301,3350,6798,40291,118282"
    )
  ) |>
  dplyr::as_tibble() |>
  janitor::clean_names() |>
  dplyr::select(
    valor, unidade_da_federacao_codigo, unidade_da_federacao, ano, sexo,
    grupo_de_idade
  ) |>
  dplyr::rename(
    n = valor,
    state_code = unidade_da_federacao_codigo,
    state = unidade_da_federacao,
    year = ano,
    sex = sexo,
    age_group = grupo_de_idade
  ) |>
  dplyr::arrange(state, sex, age_group) |>
  dplyr::mutate(
    year = as.integer(year),
    country = "Brazil",
    region = orbis::get_brazil_region(state),
    state_code = as.integer(state_code),
    sex = dplyr::case_match(
      sex,
      "Homens" ~ "Male",
      "Mulheres" ~ "Female"
    ),
    sex = factor(sex, ordered = FALSE),
    age_group = dplyr::case_match(
      age_group,
      "0 a 4 anos" ~ "0-4",
      "5 a 9 anos" ~ "5-9",
      "10 a 13 anos" ~ "10-13",
      "14 a 15 anos" ~ "14-15",
      "16 a 17 anos" ~ "16-17",
      "18 a 19 anos" ~ "18-19",
      "20 a 24 anos" ~ "20-24",
      "25 a 29 anos" ~ "25-29",
      "30 a 39 anos" ~ "30-39",
      "40 a 49 anos" ~ "40-49",
      "50 a 59 anos" ~ "50-59",
      "60 a 64 anos" ~ "60-64",
      "65 anos ou mais" ~ "65+"
    ),
    age_group = factor(age_group, ordered = TRUE),
    age_group_midpoint = dplyr::case_when(
      age_group == "0-4" ~ 2,
      age_group == "5-9" ~ 7,
      age_group == "10-13" ~ 11.5,
      age_group == "14-15" ~ 14.5,
      age_group == "16-17" ~ 16.5,
      age_group == "18-19" ~ 18.5,
      age_group == "20-24" ~ 22,
      age_group == "25-29" ~ 27,
      age_group == "30-39" ~ 34.5,
      age_group == "40-49" ~ 44.5,
      age_group == "50-59" ~ 54.5,
      age_group == "60-64" ~ 62,
      age_group == "65+" ~ 65 + 62 - 54.5 # 65 + 62 - 54.5
    ),
    n = as.integer(n * 1000)
  ) |>
  dplyr::relocate(
    year, country, region, state_code, state, sex, age_group,
    age_group_midpoint, n
  )

ibge_6407_data

The statistics presented in this section are estimates based on the midpoints of age groups and should be interpreted with caution. The variable \(n\) is expressed in thousands of individuals.

Code
ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  dplyr::select(age, n) |>
  tidyr::uncount(n) |>
  rutils:::stats_summary("age")
Code
ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  rutils:::summarize_by("age", "sex", "n")
Code
ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  rutils:::summarize_by("age", "region", "n")
Code
ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  rutils:::summarize_by("age", "state", "n")
Code
plot_ibge_6407_age_1 <-
  ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  dplyr::select(sex, age, n) |>
  tidyr::uncount(n) |>
  plotr:::plot_age_pyramid(
    breaks = c(0, 10, 20, 30, 40, 50, 60, 65, 90)
  )

Code
plot_ibge_6407_age_2 <-
  ibge_6407_data |>
  dplyr::rename(age = age_group_midpoint) |>
  dplyr::mutate(n = n / 1000) |>
  dplyr::select(state_code, age, n) |>
  tidyr::uncount(n) |>
  plotr:::plot_brazil_state(
    col_fill = "age",
    year = 2017,
    transform = "identity",
    direction = -1,
    quiet = TRUE,
    scale_type = "binned"
  )

C.8.2 Full Sample

Code
anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::stats_summary("age")
Code
anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::summarize_by("age", "sex")
Code
anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::summarize_by("age", "region")
Code
anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::summarize_by("age", "state")
Code
plot_full_age_1<-
  anonymized_data |>
  plotr:::plot_age_pyramid()

Code
plot_full_age_2 <-
  anonymized_data |>
  plotr:::plot_brazil_state(
    col_fill = "age",
    year = 2017,
    transform = "identity",
    direction = -1,
    quiet = TRUE,
    scale_type = "binned"
  )

Code
plot_full_age_3 <-
  anonymized_data |>
  plotr:::plot_brazil_municipality(
    col_fill = "age", # Means
    year = 2017,
    direction = -1,
    quiet = TRUE
  )

C.8.3 Analysis Sample

Code
weighted_data |> rutils:::stats_summary("age")
Code
weighted_data |> rutils:::summarize_by("age", "sex")
Code
weighted_data |> rutils:::summarize_by("age", "region")
Code
weighted_data |> rutils:::summarize_by("age", "state")
Code
plot_analysis_age_1 <-
  weighted_data |>
  plotr:::plot_age_pyramid()

Code
plot_analysis_age_2 <-
  anonymized_data |>
  plotr:::plot_brazil_state(
    col_fill = "age", # Means
    year = 2017,
    direction = -1,
    quiet = TRUE,
    scale_type = "binned"
  )

Code
plot_analysis_age_3 <-
  weighted_data |>
  plotr:::plot_brazil_municipality(
    col_fill = "age", # Means
    year = 2017,
    direction = -1,
    quiet = TRUE
  )

C.9 Weight Distributions

C.9.1 Full Sample

Code
weighted_data |>
  dplyr::filter(!rutils::test_outlier(weight)) |>
  plotr:::plot_latitude_series(
    col = "weight",
    y_label = "Weight (kg)"
  )
Figure C.39: Boxplots of mean weight values (kg) aggregated by 1° latitude intervals, illustrating the relationship between latitude and weight. The × symbol points to the mean. The orange line represents a linear regression.

Source: Created by the author.

Code
plot_age_sex_weigth_series <-
  anonymized_data |>
  dplyr::filter(age <= 50) |>
  plotr:::plot_series(
    col_y = "weight",
    y_label = "Weight (kg)"
  )
Figure C.40: Relation between age and weight (kg), divided by sex and aggregated by the mean. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

(Source: Created by the author, based on a data visualization from Roenneberg et al., 2007[Figure 4]).{.legend}

C.10 Chronotype Distributions

C.10.1 Full Sample

Code
anonymized_data |>
  dplyr::filter(country == "Brazil") |>
  rutils:::stats_summary("msf_sc", as_list = TRUE) |>
  dplyr::as_tibble() |>
  dplyr::mutate(
    dplyr::across(
      .cols = dplyr::where(hms::is_hms),
      .fns = lubritime::round_time
    ),
    dplyr::across(
      .cols = dplyr::everything(),
      .fns = as.character
    )
  ) |>
  tidyr::pivot_longer(cols = dplyr::everything())
Code
anonymized_data |> plotr:::get_msf_sc_cutoffs()
Code
plot_age_sex_series <-
  anonymized_data |>
  dplyr::filter(age <= 50) |>
  plotr:::plot_series()
Figure C.41: Observed relation between age and chronotype, divided by sex and aggregated by the mean. Chronotype is represented by the local time of the sleep corrected midpoint between sleep onset and sleep end on work-free days (MSFsc), MCTQ proxy for measuring the chronotype. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

Source: Created by the author based on a data visualization from Roenneberg et al. (2007, Figure 4).

Figure C.42: Distribution of European chronotypes by age, as shown in Roenneberg et al. (2007), for comparison.

Source: Reproduced from Roenneberg et al. (2007, Figure 4).

Code
plot_age_sex_series <-
  anonymized_data |>
  dplyr::filter(!rutils::test_outlier(weight)) |>
  plotr:::plot_series(
    col_x = "weight",
    x_label = "Weigth",
    date_breaks = "30 min"
  )
Figure C.43: Observed relation between weight and chronotype, divided by sex and aggregated by the mean. Chronotype is represented by the local time of the sleep corrected midpoint between sleep onset and sleep end on work-free days (MSFsc), MCTQ proxy for measuring the chronotype. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

Source: Created by the author based on a data visualization from Roenneberg et al. (2007, Figure 4).

C.10.2 Analysis Sample

Code
weighted_data |>
  rutils:::stats_summary("msf_sc", as_list = TRUE) |>
  dplyr::as_tibble() |>
  dplyr::mutate(
    dplyr::across(
      .cols = dplyr::where(hms::is_hms),
      .fns = lubritime::round_time
    ),
    dplyr::across(
      .cols = dplyr::everything(),
      .fns = as.character
    )
  ) |>
  tidyr::pivot_longer(cols = dplyr::everything())
Code
weighted_data |> plotr:::get_msf_sc_cutoffs()
Code
weighted_data |>
  dplyr::mutate(
    msf_sc_category = plotr:::categorize_msf_sc(msf_sc),
    msf_sc_category = factor(
      msf_sc_category,
      levels = c(
        "Extremely early", "Moderately early", "Slightly early",
        "Intermediate", "Slightly late", "Moderately late",
        "Extremely late"
      ),
      ordered = TRUE
    )
  ) |>
  rutils:::summarize_by("msf_sc", "msf_sc_category")
Code
weighted_data |> plotr:::plot_chronotype()
Figure C.44: Observed distribution of the local time of the sleep-corrected midpoint between sleep onset and sleep end on work-free days (MSFsc), a proxy for chronotype.
Chronotypes are categorized into quantiles, ranging from extremely early (\(0 |- 0.111\)) to extremely late (\(0.888 |- 1\)).

Source: Created by the author based on a data visualization from Roenneberg et al. (2019, Figure 1).

Figure C.45: Distribution of European chronotypes, as shown in Roenneberg et al. (2019) (for comparison).

Source: Reproduced from Roenneberg et al. (2019, Figure 1, Right part).

Code
weighted_data |> rutils:::summarize_by("msf_sc", "sex")
Code
weighted_data |>
  dplyr::mutate(
    age_group = dplyr::case_when(
      dplyr::between(age, 0, 4) ~ "0-4",
      dplyr::between(age, 5, 9) ~ "5-9",
      dplyr::between(age, 10, 13) ~ "10-13",
      dplyr::between(age, 14, 15) ~ "14-15",
      dplyr::between(age, 16, 17) ~ "16-17",
      dplyr::between(age, 18, 19) ~ "18-19",
      dplyr::between(age, 20, 24) ~ "20-24",
      dplyr::between(age, 25, 29) ~ "25-29",
      dplyr::between(age, 30, 39) ~ "30-39",
      dplyr::between(age, 40, 49) ~ "40-49",
      dplyr::between(age, 50, 59) ~ "50-59",
      dplyr::between(age, 60, 64) ~ "60-64",
      age >= 65 ~ "65+"
    )
  ) |>
  rutils:::summarize_by("msf_sc", "age_group")
Code
plot_age_sex_series <-
  weighted_data |>
  dplyr::filter(age <= 50) |>
  plotr:::plot_series()
Figure C.46: Observed relation between age and chronotype, divided by sex and aggregated by the mean. Chronotype is represented by the local time of the sleep corrected midpoint between sleep onset and sleep end on work-free days (MSFsc), MCTQ proxy for measuring the chronotype. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

Source: Created by the author based on a data visualization from Roenneberg et al. (2007, Figure 4).

Code
plot_age_sex_series <-
  weighted_data |>
  dplyr::filter(!rutils::test_outlier(weight), weight > 45) |>
  plotr:::plot_series(
    col_x = "weight",
    x_label = "Weigth",
    date_breaks = "30 min"
  )
Figure C.47: Observed relation between weight and chronotype, divided by sex and aggregated by the mean. Chronotype is represented by the local time of the sleep corrected midpoint between sleep onset and sleep end on work-free days (MSFsc), MCTQ proxy for measuring the chronotype. The gray line represents both sex. Vertical lines represent the standard error of the mean (SEM).

Source: Created by the author based on a data visualization from Roenneberg et al. (2007, Figure 4).

Code
weighted_data |> rutils:::summarize_by("msf_sc", "region")
Code
weighted_data |> rutils:::summarize_by("msf_sc", "state")
Code
limits <- # Interquartile range (IQR): Q3 - Q1
  c(
    weighted_data |>
      dplyr::pull(msf_sc) |>
      lubritime::link_to_timeline() |>
      as.numeric() |>
      stats::quantile(0.25, na.rm = TRUE),
    weighted_data |>
      dplyr::pull(msf_sc) |>
      lubritime::link_to_timeline() |>
      as.numeric() |>
      stats::quantile(0.75, na.rm = TRUE)
  )

weighted_data |>
  dplyr::mutate(
    msf_sc =
      msf_sc |>
      lubritime::link_to_timeline() |>
      as.numeric()
  ) |>
  plotr:::plot_brazil_state(
    col_fill = "msf_sc",
    year = 2017,
    breaks = seq(limits[1], limits[2], length.out = 6) |> groomr::rm_caps(),
    labels = plotr:::format_as_hm,
    limits = limits, # !!!
    quiet = TRUE
  )
Figure C.48: Observed geographical distribution of MSFsc values by Brazilian state, illustrating how chronotype varies with latitude in Brazil.
MSFsc is a proxy for chronotype, representing the midpoint of sleep on work-free days, adjusted for sleep debt. Higher MSFsc values indicate a tendency towards eveningness. The color scale is bounded by the first and third quartiles. Differences in mean MSFsc values across states are small and fall within a narrow range relative to the scale of the Munich ChronoType Questionnaire (MCTQ), limiting the significance of these variations.

Source: Created by the author.

Code
limits <- # Interquartile range (IQR): Q3 - Q1
  c(
    weighted_data |>
      dplyr::pull(msf_sc) |>
      lubritime::link_to_timeline() |>
      as.numeric() |>
      quantile(0.25, na.rm = TRUE),
    weighted_data |>
      dplyr::pull(msf_sc) |>
      lubritime::link_to_timeline() |>
      as.numeric() |>
      quantile(0.75, na.rm = TRUE)
  )

weighted_data |>
  dplyr::mutate(
    msf_sc =
      msf_sc |>
      lubritime::link_to_timeline() |>
      as.numeric()
  ) |>
  plotr:::plot_brazil_municipality(
    col_fill = "msf_sc",
    year = 2017,
    breaks = seq(limits[1], limits[2], length.out = 6) |> groomr::rm_caps(),
    labels = plotr:::format_as_hm,
    limits = limits,
    quiet = TRUE,
    reverse = TRUE
  )

Code
weighted_data |>
  dplyr::mutate(
    msf_sc_category = plotr:::categorize_msf_sc(msf_sc),
    msf_sc_category = factor(
      msf_sc_category,
      levels = c(
        "Extremely early", "Moderately early", "Slightly early",
        "Intermediate", "Slightly late", "Moderately late",
        "Extremely late"
      ),
      ordered = TRUE
    )
  ) |>
  plotr:::plot_brazil_point(
    col_group = "msf_sc_category",
    year = 2017,
    scale_type = "discrete"
  )

Code
plot <-
  weighted_data |>
  dplyr::mutate(
    msf_sc_category = plotr:::categorize_msf_sc(msf_sc),
    msf_sc_category = factor(
      msf_sc_category,
      levels = c(
        "Extremely early", "Moderately early", "Slightly early",
        "Intermediate", "Slightly late", "Moderately late",
        "Extremely late"
      ),
      ordered = TRUE
    )
  ) |>
  plotr:::plot_brazil_point(
    col_group = "msf_sc_category",
    year = 2017,
    size = 0.1,
    alpha = 1,
    print = FALSE,
    scale_type = "discrete"
  ) +
  ggplot2::theme(
    axis.title = ggplot2::element_blank(),
    axis.text= ggplot2::element_blank(),
    axis.ticks = ggplot2::element_blank(),
    panel.grid.major = ggplot2::element_blank(),
    panel.grid.minor = ggplot2::element_blank(),
    legend.position = "none"
  )

plot |>
  plotr:::rm_ggspatial_scale() +
  ggplot2::facet_wrap(~msf_sc_category, ncol = 4, nrow = 2)
Figure C.49: Observed geographical distribution of MSFsc values by a spectrum of extremely early and extremely late chronotypes, illustrating how chronotype varies with latitude in Brazil.
MSFsc is a proxy for chronotype, representing the midpoint of sleep on work-free days, adjusted for sleep debt. Chronotypes are categorized into quantiles, ranging from extremely early (\(0 |- 0.111\)) to extremely late (\(0.888 |- 1\)). No discernible pattern emerges from the distribution of chronotypes across latitudes.

Source: Created by the author.

Code
weighted_data |> plotr:::plot_latitude_series()
Figure C.50: Boxplots of observed mean MSFsc values aggregated by \(1°\) latitude intervals, illustrating the relationship between latitude and chronotype.
MSFsc is a proxy for chronotype, representing the midpoint of sleep on work-free days, adjusted for sleep debt. Higher MSFsc values indicate a tendency towards eveningness. The × symbol points to the mean. The orange line represents a linear regression. The differences in mean/median values across latitudes are minimal relative to the Munich ChronoType Questionnaire (MCTQ) scale.

Source: Created by the author.

Code
weighted_data |>
  plotr:::plot_series(
    col_x = "latitude",
    x_label = "Latitude",
    date_breaks = "15 min",
    reverse = TRUE,
    change_sign = TRUE
  )