Appendix B — Methods
B.1 Overview
This document focuses on providing a detailed explanation of the methods and steps involved in building the models and testing the thesis hypothesis.
For a comprehensive review of the thesis question and hypothesis, please refer to Supplementary Material A.
B.2 Approach and Procedure Method
This study adopted the hypothetical-deductive method, also known as the method of conjecture and refutation (Popper, 1979, p. 164), to approach problem-solving. As a procedural method, it utilized an enhanced version of Null Hypothesis Significance Testing (NHST), grounded in the original Neyman-Pearson framework for data testing (Neyman & Pearson, 1928a, 1928b; Perezgonzalez, 2015).
B.3 Measurement Instrument
Chronotypes were assessed using a sleep log based on the core version of the standard Munich ChronoType Questionnaire (MCTQ) (Roenneberg et al., 2003), a well-validated and widely applied self-report tool for measuring sleep-wake cycles and chronotypes (Roenneberg et al., 2019). The MCTQ captures chronotype as a biological circadian phenotype, determined by the sleep-corrected midpoint of sleep (MS) (Figure B.1) on work-free days (MSF), accounting for any potential sleep compensation due to sleep deficits (sc = sleep correction) on workdays (Final abbreviation: MSFsc) (Roenneberg, 2012).
Participants completed an online questionnaire, which included the sleep log as well as sociodemographic (e.g., age, sex), geographic (e.g., full residential address), anthropometric (e.g., weight, height), and data on work or study routines. A sample version of the questionnaire, stored independently by the Internet Archive organization, can be viewed at https://web.archive.org/web/20171018043514/each.usp.br/gipso/mctq.
B.4 Sample
The dataset used for analysis is made up of \(65,824\) Brazilian individuals aged 18 or older, residing in the UTC-3 timezone, who completed the survey between October 15th and 21st, 2017.
The unfiltered valid sample comprises \(115,166\) participants from all Brazilian states, while the raw sample is composed of \(120,265\) individuals. The majority of the sample data was obtained in 2017 from October 15th to 21st by a broadcast of the online questionnaire on a popular Brazil’s Sunday TV show with national reach (Rede Globo, 2017). This amount of data collected in such a short time gave the sample a population cross-sectional characteristic.
A survey conducted in 2019 by the Brazilian Institute of Geography and Statistics (IBGE) (2021) found that \(82.17\%\) of Brazilian households had access to an internet connection. Therefore, this sample is likely to have a good representation of Brazil’s population.
Daylight Saving Time (DST) began in Brazil at midnight on October 15th, 2017. Residents from the Midwest, Southeast, and South regions were instructed to set the clock forward by 1 hour. I believe that this event did not contaminate the data since it started on the same day of the data collection. It’s important to notice that we asked subjects to relate their routine behavior, not how they behaved in the last few days. A possible effect of the DST on the sample would be the production of an even later chronotype for populations near the planet’s poles, amplifying a possible latitude effect. However, this was not shown on the data.
To balance the sample, a weighting procedure was applied to the data. The weights were calculated by cell weighting, using the sex, age group and Brazil’s state as reference This procedure can be found on Supplementary Material D.
More information about the sample can be found on Supplementary Material C.
B.5 Geographical Data
Geographic data were collected by the variables country
, state
, municipality
, and postal code
. The unique values of those data were manually inspected and adjusted using lookup tables. The municiplaity
values were first matched using string distance algorithms present in the stringdist R package (van der Loo, n.d.) and data from the Brazilian Institute of Geography and Statistics (IBGE) via the geobr R package (R. H. M. Pereira & Goncalves, n.d.), other processes were then performed manually.
This was a hard task involving crossing information from different sources, such as the QualoCEP (Qual o CEP, 2024), Google Geocoding [google], ViaCEP (ViaCEP, n.d.), and OpenStreetMap (OpenStreetMap contributors, n.d.) databases, along with the Brazilian postal service (Correios) postal code documentation. Hence, the data matching shown in the lookup table was gathered considering not only one variable, but the whole set of geographical information provided by the respondent. Special cases were coded in the lookup table named special_cases
All values were also checked for ambiguities, including the municipalities names (e.g., the name Aracoiba could refer to the municipality of Aracoiaba in the state of Ceará, but could also refer to the municipality of Araçoiaba da Serra in the state of São Paulo). All values that had a similarity or pattern matching with one or more municipalities were manually inspected to avoid errors (see get_more_than_one_geographical_match
function in the code repository).
B.5.1 Postal codes
After removing all non-numeric characters from the Brazilian postal codes (Código de Endereçamento Postal (CEP)), they were processed by the following rules:
- If they had 9 or more digits, they were truncated to 8 digits (the first 8 digits are the postal code).
- If they between 5 and 7 digits, they were complemented with
0
s at the ending. - If they had less than 5 digits, they were discarded.
In addition, a visual inspection was performed to check for inconsistencies.
After this process, the postal codes were matched with the QualoCEP database (Qual o CEP, 2024). Existing postal codes were than validated by the following rules:
- If the postal code had not been modified and the state or the municipality was the same, it was considered valid.
- If the postal code had been modified and the state and municipality were the same, it was considered valid.
- Else, it was considered invalid. Invalid CEPs were discarded in the processed dataset.
Invalid postal codes were then matched with data derived from geocoding using Google Geocoding API (Google, n.d.) via the tidygeocoder
R package (Cambon et al., 2021), which is stored in the thesis lookup tables. After that, the same process of validation was performed. The API data was able to reduce invalid postal codes from 563 ro 292, a 48.13% reduction. The postal codes that were not validated were discarded from the final data.
It’s important to note that some postal codes could not be evaluated because they were missing the state or municipality information. These postal codes were maintained, but no geocode data was associated with them.
Finally, the state
and municipality
variables were adjusted using the data from the valid postal codes.
Non-Brazilian postal codes were not validated, but they were cleaned by removing non-digit characters and codes with 3 digits or less. They also went through a process da cleaning via visual inspection. Their values can be found in the special_cases
lookup table.
B.5.2 Latitudes and Longitudes
Latitudes and longitudes values consider only the municipality
and postal_code
variables. They were extracted from the QualoCEP database, which is the result of a reverse geocoding using the Google Geocoding API. See Appendice 6 to side by side comparison of the latitudes and longitudes from QualoCEP and Google Geocoding API.
For respondents who did not provide a valid postal code, the latitude and longitude were extracted via the mean of the latitudes and longitudes associated with the municipality in the QualoCEP database.
Finally, Google Geocoding API, via the tidygeocoder
R package, was used on cases that didn’t had a match in the QualoCEP database.
B.6 Solar Irradiance Data
The solar irradiance data is based on Brazil’s National Institute for Space Research (INPE) 2017 Laboratory of Modeling and Studies of Renewable Energy Resources (LABREN) 2017 Solar Energy Atlas (E. B. Pereira et al., 2017). In particular, it was used the Global Horizontal Irradiance (GHI) data, which is the total amount of irradiance received from above by a surface horizontal to the ground.
The data comprise annual and monthly averages of the daily total irradiation in Wh/m².day with spatial resolution of 0.1° x 0.1° (latitude/longitude) (about 10km x 10km). It’s important to note that the sample was collected in the same year of the radiance data.
B.7 Solar Time Data
The suntools
R package (Bivand & Luque, n.d.) was used to calculate the local of time of sunrise, sunset, and daylight duration for a given date and location. suntools
functions are based on equations provided by Meeus (1991) and by the United States’s National Oceanic & Atmospheric Administration (NOAA).
The data and time of the equinox and solstices were gathered from the Time and Date AS service (Time and Date AS, n.d.). For validity, this data was checked with the equations from Meeus (1991) and the results of the National Aeronautics and Space Administration (NASA) ModelE AR5 Simulations (National Aeronautics and Space Administration & Goddard Institute for Space Studies, n.d.).
B.8 Data Management
A data management plan for this study was created and published using the California Digital Library’s DMP Tool (Vartanian, 2024). It is available at https://doi.org/10.17605/OSF.IO/3JZ7K.
All data are stored in the study’s research compendium on the Open Source Framework (OSF), hosted on Google Cloud servers in the USA. Access to the compendium is restricted, and data are encrypted using a 4096-bit RSA key pair (Rivest-Shamir-Adleman), along with a unique 32-byte project password.
Access to the data requires authorization from the author on the Open Science Framework (OSF) and the installation of the necessary encryption keys.
B.9 Data Wrangling
Data wrangling and analysis followed the data science program proposed by Hadley Wickham and Garrett Grolemund (Wickham et al., 2023) (Figure B.4). All processes were made with the help of the R programming language (R Core Team, n.d.), RStudio IDE (Posit Team, n.d.), and several R packages. The tidyverse and rOpenSci peer-reviewed package ecosystem and other R packages adherents of the tidy tools manifesto (Wickham, 2023) were prioritized. The MCTQ data was analyzed using the mctq
rOpenSci peer-reviewed package (Vartanian, n.d.). All processes were made in order to provide result reproducibility and to be in accordance with the FAIR principles (Wilkinson et al., 2016).
B.9.1 Pipeline
The pipeline for data processing is based on the framework outlined in the targets
R package manual. You can view and reproduce this pipeline in the _target.R
file located at the root of the thesis code repository.
[1] “{.r filename='_targets.R'}\n#| code-fold: false\n\n# See <https://books.ropensci.org/targets/> to learn more.\n\nlibrary(here)\nlibrary(tarchetypes)\nlibrary(targets)\n\n# source(here::here(\"R\", \"set_locale.R\"))\nsource(here::here(\"R\", \"get_raw_data.R\"))\nsource(here::here(\"R\", \"tidy_data_.R\"))\nsource(here::here(\"R\", \"validate_data.R\"))\nsource(here::here(\"R\", \"analyze_data.R\"))\nsource(here::here(\"R\", \"geocode_data.R\"))\nsource(here::here(\"R\", \"add_solar_data.R\"))\nsource(here::here(\"R\", \"anonymize_data.R\"))\nsource(here::here(\"R\", \"filter_data.R\"))\nsource(here::here(\"R\", \"weigh_data.R\"))\nsource(here::here(\"R\", \"lock_and_store_data.R\"))\n\ntargets::tar_option_set(\n packages = c(\n \"cli\",\n \"curl\",\n \"dplyr\",\n \"here\",\n \"hms\",\n \"lockr\", # github.com/danielvartan/lockr\n \"lubridate\", # For masking reasons.\n \"lubritime\", # github.com/danielvartan/lubritime\n \"methods\",\n \"mctq\",\n \"prettycheck\", # github.com/danielvartan/prettycheck\n \"osfr\",\n \"readr\",\n \"rlang\",\n \"rutils\", # github.com/danielvartan/rutils\n \"scaler\", # github.com/danielvartan/scaler\n \"stringr\",\n \"tidyr\",\n \"utils\"\n )\n)\n\n# tar_make_clustermq() is an older (pre-{crew}) way to do distributed computing\n# in {targets}, and its configuration for your machine is below.\noptions(clustermq.scheduler = \"multiprocess\")\n\n# tar_make_future() is an older (pre-{crew}) way to do distributed computing\n# in {targets}, and its configuration for your machine is below.\nfuture::plan(future.callr::callr)\n\n# Run the R scripts in the R/ folder with your custom functions:\n# targets::tar_source(files = here::here(\"R\"))\n# source(\"other_functions.R\") # Source other scripts as needed.\n\n# Replace the target list below with your own:\nlist(\n targets::tar_target(\n name = raw_data,\n command = get_raw_data()\n ),\n targets::tar_target(\n name = tidy_data,\n command = tidy_data_(raw_data)\n ),\n targets::tar_target(\n name = validated_data,\n command = validate_data(tidy_data)\n ),\n targets::tar_target(\n name = analyzed_data,\n command = analyze_data(validated_data)\n ),\n targets::tar_target(\n name = geocoded_data,\n command = geocode_data(analyzed_data)\n ),\n targets::tar_target(\n name = added_data,\n command = add_solar_data(geocoded_data)\n ),\n targets::tar_target(\n name = anonymized_data,\n command = anonymize_data(added_data)\n ),\n targets::tar_target(\n name = filtered_data,\n command = filter_data(anonymized_data)\n ),\n targets::tar_target(\n name = weighted_data,\n command = weigh_data(filtered_data)\n )\n # targets::tar_target(\n # name = locked_data,\n # command = lock_and_store_data(weighted_data)\n # )\n)\n
”
B.9.2 Lookup Tables
Along with the data cleaning procedures described in the pipeline, lookup tables were employed to clean text and character variables. These tables were created by manually inspecting the unique values of the raw data and correcting common misspellings, synonyms, and other inconsistencies. The tables are stored in the research compendium of this thesis.
Text/character variables: track
, names
, email
, country
, state
, municiplality
, postalcode
, sleep_drugs_which
, sleep_disorder_which
, medication_which
.
The lookup tables are available in the research compendium of this thesis. They had to be encrypted because of the sensitive information they contain. If you need access, please contact the author.
The matching of the variables sleep_drugs_which
, sleep_disorder_which
, medication_which
is not complete. This variables were not used in the analysis, but they are available in the research compendium.
Read the section about geographical information to learn more about the matching process.
B.9.3 Circular Statistics
MCTQ is based on local time data, which are circular in nature (i.e., it cycles every 24 hours). Performing statistics with this kind of variables is challenging, since it can have different values depending on each arc/interval of the circle is used.
For example, the distance between 23:00 (the values here are always in the 24 hour scale) and 01:00 can be 02:00 or 22:00 depending on the direction of the measurement. Hence, the analysis will always have to choose a direction (Figure B.5).
The analysis assumed as a method for dealing with this issue the adaptation of the values using the 12:00 hour as a reference point. This method is appropriated when dealing with sleep data.
Consider the local time of sleep onset. We can observe that some subjects start sleeping before midnight, while others sleep after midnight. By the context of the data, the circle arc/interval (Figure B.6) here is the shorter one, since the great majority of people don’t usually sleep in daytime and for more than 12 hours.
Using the 12:00 hour a threshold, it’s possible to calculate the correct distance between times during the calculations. This method link the values in a two-day timeline, with values equal or greater than 12:00 allocated on day 1, and values with less than 12:00 allocated on day 2. This way, the distance between 23:00 and 01:00 is 02:00, the distance between 01:00 and 23:00 is 22:00 (i.e., there is no gap between data points) (Figure B.7).
Code
# library(dplyr)
# library(ggplot2)
# library(here)
# library(lubritime)
# library(patchwork)
# library(tidyr)
source(here::here("R", "plot_hist.R"))
weighted_data <- targets::tar_read(
"weighted_data",
store = here::here("_targets")
)
plot_1 <-
weighted_data |>
dplyr::select(so_f) |>
tidyr::drop_na() |>
plot_hist(
col = "so_f",
x_label = "Local time of sleep onset",
print = FALSE
)
plot_2 <-
weighted_data |>
dplyr::select(so_f) |>
dplyr::mutate(
so_f =
so_f |>
lubritime:::link_to_timeline(threshold = hms::parse_hms("12:00:00"))
) |>
tidyr::drop_na() |>
plot_hist(
col = "so_f",
x_label = "Local time of sleep onset",
print = FALSE
)
patchwork::wrap_plots(
plot_1, plot_2,
ncol = 2
)
B.9.4 Round-Off Errors
The R programming language only stores values up to 53 binary bits, that’s about \(15.955\) digits of precision (\(x = 53 \log_{10}(2)\)). Since MCTQ deals with self-reported local time of day (e.g., 02:30) and duration (e.g., 15 minutes), a greater floating-point precision is unnecessary. Even with multiple computations, round-off errors wouldn’t significantly impact the phenomenon under study.
Even considering that POSIXct
objects are used in the models, there is still a good margin of precision – POSIXct objects are data-time objects measured by the amount of seconds since the UNIX epoch (1970-01-01).
B.10 Hypothesis Test
The study hypothesis was tested using nested multiple linear regressions. The primary concept of nested models is to evaluate the effect of adding one or more predictors on the model’s variance explanation (i.e., the \(\text{R}^{2}\)) (Allen, 1997; Maxwell et al., 2018). This is achieved by creating a restricted model (\(r\)) and comparing it with a full model (\(f\)). The hypothesis can be outlined as follows:
- Null hypothesis (\(\text{H}_{0}\))
- Adding latitude does not meaningfully improve the model’s fit. This implies that the change in adjusted \(\text{R}^{2}\) is negligible, or the F-test is not significant, given a type I error probability (\(\alpha\)) of \(0.05\).
- Alternative Hypothesis (\(\text{H}_{a}\))
- Adding latitude meaningfully improves the model’s fit. This implies that the change in adjusted \(\text{R}^{2}\) exceeds the expected Minimum Effect Size (MES), and the F-test is significant, given a type I error probability (\(\alpha\)) of \(0.05\).
Conjecture B.1 (Data Test – Full Version) \[ \begin{cases} \text{H}_{0}: \Delta \ \text{Adjusted} \ \text{R}^{2} \leq \text{MES} \quad \text{or} \quad \text{F-test is not significant} \ (\alpha \geq 0.05) \\ \text{H}_{a}: \Delta \ \text{Adjusted} \ \text{R}^{2} > \text{MES} \quad \text{and} \quad \text{F-test is significant} \ (\alpha < 0.05) \end{cases} \]
Where:
\[ \Delta \ \text{Adjusted} \ \text{R}^{2} = \text{Adjusted} \ \text{R}^{2}_{f} - \text{Adjusted} \ \text{R}^{2}_{r} \tag{B.1}\]
and \(\text{MES}\) is a Minimum Effect Size, a threshold for detecting significant changes.
\(\text{R}^{2}\) is a statistic that represents the proportion of variance explained in the relationship between two or more variables (goodness of fit) (Frey, 2022, p. 1339). It ranges from \(1\) (perfect prediction) to \(0\) (no prediction) and can be defined as (DeGroot & Schervish, 2012, p. 748):
\[ \text{R}^{2} = 1 - \cfrac{\text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}} = 1 - \cfrac{\sum \limits^{n}_{i = 1} (y_{i} - \hat{y}_{i})^{2}}{\sum \limits^{n}_{i = 1} (y_{i} - \overline{y})^{2}} = 1 - \cfrac{\text{Unexplained Variance}}{\text{Total Variance}} = \cfrac{\text{Explained Variance}}{\text{Total Variance}} \tag{B.2}\]
Where:
- \(y_{i}\) = Observed value of the dependent variable;
- \(\hat{y}_{i}\) = Predicted value of the dependent variable;
- \(\overline{y}\) = Mean of the dependent variable;
- \(\text{SS}_{\text{residual}}\) = Sum of the squared prediction errors (residuals) across all observations (DeGroot & Schervish, 2012, p. 748; Hair, 2019, p. 264);
- \(\text{SS}_{\text{total}}\) = Total sum of squares or the total amount of variation that exists to be explained by the independent variables. Is equivalent by the sum of the squared difference between the observed value and the mean of the dependent variable (baseline prediction) (DeGroot & Schervish, 2012, p. 748; Hair, 2019, p. 265).
Here, \(\text{R}^{2}\) is the proportion of the variance in the dependent variable that is predictable from the independent variables.
The adjusted \(\text{R}^{2}\) is a modified version of \(\text{R}^{2}\) that adjusts for the number of predictors in the model. It can be defined as:
\[ \text{Adjusted} \ \text{R}^{2} = 1 - \cfrac{\text{SS}_{\text{residual}} / \text{df}_{\text{residual}}}{\text{SS}_{\text{total}} / \text{df}_{\text{total}}} = 1 - \cfrac{(1 - \text{R}^{2}) \times (\text{n} - 1)}{\text{n} - k - 1} \tag{B.3}\]
Where:
- \(\text{n}\) = Number of observations in the sample;
- \(k\) = Number of independent variables in the model.
- \(\text{df}_{\text{residual}}\) = Degrees of freedom of the residual sum of squaress = \(n - k - 1\).
- \(\text{df}_{\text{total}}\) = Degrees of freedom of the total sum of squares = \(n - 1\).
The F-test serves as to determine wether the ratio of variances is different from zero (i.e., statistically significant) considering a baseline prediction (Hair, 2019, p. 300). In this case, the baseline prediction is the restricted model. The general equation for the F-test for nested models (Allen, 1997, p. 113) can be defined as:
\[ \text{F} = \cfrac{\text{R}^{2}_{f} - \text{R}^{2}_{r} / (k_{f} - k_{R})}{(1 - \text{R}^{2}_{f}) / (\text{n} - k_{f} - 1)} \tag{B.4}\]
Where:
- \(\text{R}^{2}_{F}\) = Coefficient of determination for the full model;
- \(\text{R}^{2}_{R}\) = Coefficient of determination for the restricted model;
- \(k_{f}\) = Number of independent variables in the full model;
- \(k_{r}\) = Number of independent variables in the restricted model;
- \(\text{n}\) = Number of observations in the sample.
\[ \text{F} = \cfrac{\text{Additional Var. Explained} / \text{Additional d.f. Expended}}{\text{Var. unexplained} / \text{d.f. Remaining}} \]
A MES must always be present in any data testing. The effect-size was present in the original Neyman and Pearson framework (Neyman & Pearson, 1928a, 1928b), but unfortunately this practice fade away with the indiscriminate use of p-values, one of the many issues that came with the Null Hypothesis Significance Testing (NHST) (Perezgonzalez, 2015). While p-values are estimates of type 1 error (in Neyman–Pearson’s approaches, or like-approaches), that’s not the main thing we are interested while doing a hypothesis test, what is really being test is the effect size (i.e., a practical significance). Another major issue to only relying on p-values is that the estimated p-value tends to decrease when the sample size is increased, hence, focusing just on p-values with large sample sizes results in the rejection of the null hypothesis, making it not meaningful in this specific situation (Gómez-de-Mariscal et al., 2021; Hair, 2019; Lin et al., 2013).
Neyman-Pearson’s approach considers, at least, two competing hypotheses, although it only tests data under one of them. The hypothesis which is the most important for the research (i.e., the one you do not want to reject too often) is the one tested (Neyman and Pearson, 1928; Neyman, 1942; Spielman, 1973). This hypothesis is better off written so as for incorporating the minimum expected effect size within its postulate (e.g., \(\text{HM} : \text{M1} – \text{M2} = 0 \pm \text{MES}\)), so that it is clear that values within such minimum threshold are considered reasonably probable under the main hypothesis, while values outside such minimum threshold are considered as more probable under the alternative hypothesis. (Perezgonzalez, 2015).
Although larger \(\text{R}^{2}\) values result in higher \(\text{F}\) values, the researcher must base any assessment of practical significance separate from statistical significance. Because statistical significance is really an assessment of the impact of sampling error, the researcher must be cautious of always assuming that statistically significant results are also practically significant. This caution is particularly relevant in the case of large samples where even small \(\text{R}^{2}\) values (e.g., \(5\%\) or \(10\%\)) can be statistically significant, but such levels of explanation would not be acceptable for further action on a practical basis. (Hair, 2019, p. 300)
Publications related to issues regarding the misuse of p-value are plenty. For more on the matter, I recommend Perezgonzalez (2015) review of Fisher’s and Neyman-Pearson’s data test proposals, Lin et al. (2013) and Gómez-de-Mariscal et al. (2021) studies about large Samples and the p-value problem, and Cohen’s essays on the subject (like Cohen (1990) and Cohen (1994)).
It’s important to note that, in addition to the F-test, it’s assumed that for \(\text{R}^{2}_{\text{r}}\) to differ significantly from \(\text{R}^{2}_{\text{f}}\), there must be a non-negligible effect size between them. This effect size can be calculated using Cohen’s \(f^{2}\) (Cohen, 1988, 1992):
\[ \text{Cohen's } f^2 = \cfrac{\text{R}^{2}}{1 - \text{R}^{2}} \]
For nested models, this can be adapted as follows:
\[ \text{Cohen's } f^2 = \cfrac{\text{R}^{2}_{f} - \text{R}^{2}_{r}}{1 - \text{R}^{2}_{f}} = \cfrac{\Delta \text{R}^{2}}{1 - \text{R}^{2}_{f}} \]
\[ f^{2} = \cfrac{\text{Additional Var. Explained}}{\text{Var. unexplained}} \]
Considering the particular emphasis that the solar zeitgeber has on the entrainment of biological rhythms (as demonstrated in many experiments), it would not be reasonable to assume that the latitude hypothesis could be supported without at least a non-negligible effect size. With this in mind, this analysis will use Cohen’s threshold for small/negligible effects, the Minimum Effect Size (MES) (\(\delta\)) is defined as 0.02 (Cohen, 1988, p. 413; 1992, p. 157).
In Cohen’s words:
What is really intended by the invalid affirmation of a null hypothesis is not that the population ES [Effect Size] is literally zero, but rather that it is negligible, or trivial (Cohen, 1988, p. 16).
SMALL EFFECT SIZE: \(f^2 = .02\). Translated into ^{2} or partial ^{2} for Case 1, this gives \(.02 / (1 + .02) = .0196\). We thus define a small effect as one that accounts for 2% of the \(\text{Y}\) variance (in contrast with 1% for \(r\)), and translate to an \(\text{R} = \sqrt{0196} = .14\) (compared to .10 for \(r\)). This is a modest enough amount, just barely escaping triviality and (alas!) all too frequently in practice represents the true order of magnitude of the effect being tested (Cohen, 1988, p. 413).
[…] in many circumstances, all that is intended by “proving” the null hypothesis is that the ES is not necessarily zero but small enough to be negligible, i.e., no larger than \(i\). How large \(i\) is will vary with the substantive context. Assume, for example, that ES is expressed as \(f^2\), and that the context is such as to consider \(f^2\) no larger than \(.02\) to be negligible; thus \(i\) = .02 (Cohen, 1988, p. 461).
\[ \text{MES} = \text{Cohen's } f^2 \text{small threshold} = 0.02 \\ \]
For comparison, Cohen’s threshold for medium effects is \(0.15\), and for large effects is \(0.35\) (Cohen, 1988, pp. 413–414; 1992, p. 157).
Knowing Cohen’s \(f^2\), is possible to calculated the equivalent \(\text{R}^{2}\):
\[ 0.02 = \cfrac{\text{R}^{2}}{1 - \text{R}^{2}} \quad \text{or} \quad \text{R}^{2} = \cfrac{0.02}{1.02} \eqsim 0.01960784 \]
In other words, the latitude must explain at least \(1.960784\%\) of the variance in the dependent variable to be considered non-negligible. This is the Minimum Effect Size (MES) for this analysis.
In summary, the decision rule for the hypothesis test is as follows:
-
Reject \(\text{H}_{0}\) if both of the following conditions are met:
- The F-test is significant.
- \(\Delta \ \text{Adjusted} \ \text{R}^{2} > 0.01960784\);
-
Fail to reject \(\text{H}_{0}\) if either of the following conditions are met:
- The F-test is not significant, or
- The F-test is significant, but \(\Delta \ \text{Adjusted} \ \text{R}^{2} \leq 0.01960784\).
As usual, the significance level (\(\alpha\)) was set at \(0.05\), allowing a 5% chance of a Type I error. A power analysis was performed to determine the necessary sample size for detecting a significant effect, targeting a power (\(1 - \beta\)) of \(0.99\).
It’s important to emphasize that this thesis is not investigating causality, only association. Predictive models alone should never be used to infer causal relationships (Arif & MacNeil, 2022).
B.11 Statistical Analyses
In addition to the analyses described in the Hypothesis Test subsection, several other evaluations were conducted to ensure the validity of the results, including: a power analysis, the visual inspection of variable distributions (e.g., Q-Q plots), assessment of residual normality, checks for multicollinearity, and examination of leverage and influential points.
All analyses were performed using computational notebooks and are fully reproducible. Detailed documentation is available in the supplementary materials.
B.12 Model Predictors
Two modeling approaches were used to assess the impact of latitude and related environmental variables on the outcome of interest. Each approach builds on predictors inspired by the methods in Leocadio-Miguel et al. (2017) while addressing methodological inconsistencies and data limitations.
B.12.1 Identified Inconsistencies
During the replication of Leocadio-Miguel et al. (2017), several inconsistencies were identified:
- Variable usage mismatch: The number of covariates listed in the results section does not align with those used in the \(\text{F}\)-test parameters for the restricted and full models.
- Multicollinearity issues: The inclusion of sunrise, sunset, and daylight duration (derived as sunset - sunrise) for the March equinox and the June and December solstices introduces multicollinearity due to the interdependence of these variables.
B.12.2 Selection of Predictors
The restricted model in Leocadio-Miguel et al. (2017) included age, longitude, and solar irradiation when the subjects filled the online questionnaire as covariates, with sex, daylight saving time (DST), and season as cofactors. The full model added, annual average solar irradiation, sunrise time, sunset time, and daylight duration for the March equinox and the June and December solstices. However, in this study:
- DST and season: Are excluded, as data were collected within a single week, rendering these variables redundant.
- Latitude proxies: Annual average solar irradiation and daylight duration for the March equinox and the June and December solstices were included. However, sunrise and sunset time coefficients were not estimable and had to be omitted. This was due to their high collinearity (\(r > 0.999\)). Centralization or standardization of the predictors did not resolve this issue.
As in Leocadio-Miguel et al. (2017), the measure of daylight duration in September equinox was not included due to the high correlation with the March equinox (\(r > 0.993\)), making it statistically indistinguishable from the latter (\(p > 0.05\)). This was expected, since the day at the equinoxes must be approximately the same length (from the Latin, aequĭnoctĭum, meaning the time of equal days and nights (Latinitium, n.d.)).
Daylight duration for the March equinox, June solstice, and December solstice exhibited high multicollinearity, with a variance inflation factor (VIF) exceeding \(1000\). However, since these variables are part of the same group (latitude proxies), this does not pose a significant issue for the analysis. The focus is on the collective effect of the group rather than the contributions of individual variables.
B.12.3 Tests and Predictors
To evaluate the hypotheses, two tests were conducted with distinct sets of predictors:
B.12.3.1 Test A
-
Restricted Model Predictors
- Age, sex, longitude, and Global Horizontal Irradiance (GHI) at the time of questionnaire completion (monthly GHI average for participants’ geographic coordinates).
-
Full Model Predictors
- Restricted model predictors + annual GHI average and daylight duration for the nearest March and September equinoxes, and the June and December solstices.
B.12.3.2 Test B
-
Restricted Model Predictors
- Age, sex, longitude, and GHI at the time of questionnaire completion (monthly GHI average for participants’ geographic coordinates).
-
Full Model Predictors
- Restricted model predictors + latitude.