Bootstrap Illustration

Author

Daniel Vartanian

Published

2025-05-20

Project Status: Inactive - The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows. License: CC0_1.0

Overview

This report contains an illustration of the bootstrap method, originally developed by Bradley Efron (1979a; 1979b, 1982).

Setting the Environment

Setting the Initial Parameters

n <- 1000
mean <- 0
sd <- 1

Theoretical Distribution

We start with a theoretical normal distribution with mean (\(\mu\)) 0 and standard deviation (\(\sigma^{2}\)) 1, representing the theoretical distribution of the population.

Definition 1 (Normal Distribution) The normal distribution has two parameters, usually denoted by \(\mu\) and \(\sigma^{2}\), which are its mean and variance. The pdf [probability density function] of the normal distribution with mean \(\mu\) and variance \(\sigma^{2}\) (usually denoted by \(\text{n}(\mu, \sigma^{2})\)) is given by: (Casella & Berger, 2002, p. 102)

\[ f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma}} \, e^{-\frac{(x - \mu)^2}{2\sigma^2}}, \quad -\infty < x < \infty \tag{1}\]

Code
normal_dist <- function(x, mean = 0, sd = 1) {
  checkmate::assert_numeric(x)
  checkmate::assert_number(mean)
  checkmate::assert_number(sd)

  (1 / (sqrt(2 * pi * sd))) * exp(1)^(- (x - mean)^2 / (2 * sd^2))
}
Code
dplyr::tibble(
  x = seq(-5, 5, length.out = n),
  y = normal_dist(x, mean, sd)
) |>
  ggplot2::ggplot(ggplot2::aes(x, y)) +
  ggpattern::geom_area_pattern(
    pattern = "stripe",
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color_tint(750, "black"),
    pattern_spacing = 0.015,
    color = brandr::get_brand_color("primary"),
    fill = "transparent",
    linewidth = 2
  ) +
  ggplot2::labs(x = "Theoretical Normal", y = "Density")

Sample

A non-random sample is drawn from a normally distributed population, intentionally biased toward higher extreme values to illustrate the effects of sampling bias.

Definition 2 (Random Sample) The random variables \(X_{1}, \ldots, X_{n}\) are called a random sample of size \(n\) from the population \(f(x)\) if \(X_{1}, \ldots, X_{n}\) are mutually independent random variables and the marginal pdf [probability density function] or pmf of each \(X_{i}\) is the same function \(f(x)\). Alternatively, \(X_{1}, \ldots, X_{n}\) are called independent and identically distributed random variables with pdf or pmf \(f(x)\). This is commonly abbreviated to iid random variables. (Casella & Berger, 2002, p. 207)

Definition 3 (Random Sample) A random sample is a collection of random variables \(X_{1}, X_{2}, \ldots, X_{n}\), that have the same probability distribution and are mutually independent. (Dekking et al., 2005, p. 246)

Population Data

pop_data <- rnorm(n * 100, mean = mean, sd = sd)
Code
pop_data |>
  summarytools::descr() |>
  as.data.frame() |>
  tibble::rownames_to_column("name") |>
  tibble::as_tibble() |>
  dplyr::rename(value = pop_data)
Code
dplyr::tibble(x = pop_data) |>
ggplot2::ggplot(ggplot2::aes(x, ggplot2::after_stat(density))) +
  ggpattern::geom_histogram_pattern(
    pattern = "stripe",
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color_tint(750, "black"),
    pattern_spacing = 0.015,
    color = brandr::get_brand_color("gray"),
    fill = "transparent",
    linewidth = 0.5,
    bins = 30
  ) +
  ggplot2::geom_density(
    color = brandr::get_brand_color("primary"),
    linewidth = 2,
    fill = NA
  ) +
  ggplot2::xlim(-5, 5) +
  ggplot2::labs(
    x = "Population data",
    y = "Density"
  )

Bias Function

Code
bias_fun <- function(x, shape_1 = 0.45, shape_2 = 0.5, max_rescale = 0.95) {
  checkmate::assert_numeric(x)
  checkmate::assert_number(shape_1)
  checkmate::assert_number(shape_2)
  checkmate::assert_number(max_rescale, lower = 0.01, upper = 1)

  x <- scales::rescale(x, to = c(0, max_rescale))

  dplyr::if_else(
    x <= 0.5,
    dbeta(0.5, shape1 = shape_1, shape2 = shape_2),
    dbeta(x, shape1 = shape_1, shape2 = shape_2)
  )
}
Code
dplyr::tibble(
  x = seq(0, 1, length.out = n),
  y = bias_fun(x)
) |>
  ggplot2::ggplot(ggplot2::aes(x, y)) +
  ggpattern::geom_area_pattern(
    pattern = "stripe",
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color_tint(750, "black"),
    pattern_spacing = 0.015,
    color = brandr::get_brand_color("primary"),
    fill = "transparent",
    linewidth = 2
  ) +
  ggplot2::labs(x = "Quantiles", y = "Probability weight")

Sample Data

data <-
  pop_data |>
  sort() |>
  sample(
    n,
    replace = FALSE,
    prob =
      seq(0, 1, length.out = length(pop_data)) |>
      bias_fun()
  )
Code
data |>
  summarytools::descr() |>
  as.data.frame() |>
  tibble::rownames_to_column("name") |>
  tibble::as_tibble() |>
  dplyr::rename(value = data)
Code
dplyr::tibble(x = data) |>
ggplot2::ggplot(ggplot2::aes(x, ggplot2::after_stat(density))) +
  ggpattern::geom_histogram_pattern(
    pattern = "stripe",
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color_tint(750, "black"),
    pattern_spacing = 0.015,
    color = brandr::get_brand_color("gray"),
    fill = "transparent",
    linewidth = 0.5,
    bins = 30
  ) +
  ggplot2::geom_density(
    color = brandr::get_brand_color("primary"),
    linewidth = 2,
    fill = NA
  ) +
  ggplot2::xlim(-5, 5) +
  ggplot2::labs(
    x = "Sample data",
    y = "Density"
  )

Bootstrap-Based t-Test

Finally, we apply the bootstrap method to estimate a confidence interval for the sample mean and conduct a t-test (Student, 1908), treating the sample mean as an estimate of the population mean.

We compare these bootstrap-based results to those from the traditional theory-based t-test, which relies on the assumption that the sample is drawn from a normally distributed population.

The bootstrap is based on a simple, yet powerful, idea (whose mathematics can get quite involved)1. In statistics, we learn about the characteristics of the population by taking samples. As the sample represents the population, analogous characteristics of the sample should give us information about the population characteristics. The bootstrap helps us learn about the sample characteristics by taking resamples (that is, we retake samples from the original sample) and use this information to infer to the population. The bootstrap was developed by Efron in the late 1970s, with the original ideas appearing in Efron (1979a; 1979b) and the monograph by Efron (1982). See also Efron (1998) for more recent thoughts and developments. (Casella & Berger, 2002, p. 478)

In Example 1.2.20 we calculated all possible averages of four numbers selected from 2, 4, 9, 12, where we drew the numbers with replacement. This is the simplest form of the boot­strap, sometimes referred to as the nonparametric bootstrap. (Casella & Berger, 2002, p. 478)

This kind of sampling is called with replacement because the value chosen at any stage is “replaced” in the population and is available for choice again at the next stage. (Casella & Berger, 2002, p. 209)

Theory-Based t-Test (Base R)

\[ \begin{cases} \text{H}_{0}: \mu = 0 \\ \text{H}_{a}: \mu \neq 0 \\ \end{cases} \]

data |>
  stats::t.test(
    alternative = "two.sided",
    conf.level = 0.95,
    mu = mean(pop_data)
  )
#> 
#>  One Sample t-test
#> 
#> data:  data
#> t = 5.5329682, df = 999, p-value = 0.00000004021649
#> alternative hypothesis: true mean is not equal to 0.000324767383
#> 95 percent confidence interval:
#>  0.1242178757 0.2603959763
#> sample estimates:
#>   mean of x 
#> 0.192306926

Theory-Based t-Test (infer)

dplyr::tibble(x = data) |>
  infer::t_test(
    response = x,
    alternative = "two.sided",
    mu = mean(pop_data),
    conf.level = 0.95
  ) |>
  dplyr::mutate(dplyr::across(dplyr::everything(), as.character)) |>
  tidyr::pivot_longer(dplyr::everything())

Bootstrap Sample Mean CI (infer)

observed_statistic <-
  dplyr::tibble(x = data) |>
  infer::specify(response = x) |>
  infer::calculate(stat = "mean")
null_dist <-
  dplyr::tibble(x = data) |>
  infer::specify(response = x) |>
  infer::generate(reps = n, type = "bootstrap") |>
  infer::calculate(stat = "mean")
Code
null_dist |>
  infer::get_confidence_interval(
    level = 0.95,
    point_estimate = observed_statistic
  )

Bootstrap-Based t-Test (infer)

observed_statistic <-
  dplyr::tibble(x = data) |>
  infer::specify(response = x) |>
  infer::calculate(stat = "mean")
null_dist <-
  dplyr::tibble(x = data) |>
  infer::specify(response = x) |>
  infer::hypothesize(null = "point", mu = mean(pop_data)) |>
  infer::generate(reps = n, type = "bootstrap") |>
  infer::calculate(stat = "mean")
Code
ci <- null_dist |>
  infer::get_confidence_interval(
    level = 0.95,
    point_estimate = observed_statistic
  )

ci
Code
null_dist |>
  infer::get_p_value(
    obs_stat = observed_statistic,
    direction = "two.sided"
  )
#> Warning: Please be cautious in reporting a p-value of 0. This result is an
#> approximation based on the number of `reps` chosen in the `generate()` step.
#> ℹ See `get_p_value()` (`?infer::get_p_value()`) for more information.
Code
null_dist |>
  infer::visualize(bins = 30) +
  infer::shade_p_value(
    obs_stat = observed_statistic,
    direction = "two-sided",
    color = brandr::get_brand_color("primary"),
    fill = brandr::get_brand_color("light-orange")
  ) +
  ggplot2::geom_vline(
    xintercept = ci$lower_ci,
    color = brandr::get_brand_color("gray"),
    linewidth = 0.5,
    linetype = "dashed"
  ) +
  ggplot2::geom_vline(
    xintercept = ci$upper_ci,
    color = brandr::get_brand_color("gray"),
    linewidth = 0.5,
    linetype = "dashed"
  ) +
  ggplot2::labs(
    title = NULL,
    x = "Null distribution of the hypothetical mean",
    y = "Frequency"
  )

Bootstrap Sample Mean CI (Independent)

means <- vapply(
  X = seq_len(n),
  FUN = function(x) {
    data |>
      sample(n, replace = TRUE) |>
      mean()
  },
  FUN.VALUE = numeric(1)
)
mean(means)
#> [1] 0.1919669725
quantile(means, 0.025)
#>         2.5% 
#> 0.1267206149
quantile(means, 0.975)
#>        97.5% 
#> 0.2608197494

Bootstrap-Based t-Test (Independent)

means <- vapply(
  X = seq_len(n),
  FUN = function(x) {
    data |>
      sample(n, replace = TRUE) |>
      mean() |>
      magrittr::subtract(mean(data) + mean(pop_data))
  },
  FUN.VALUE = numeric(1)
)
mean(means)
#> [1] -0.0007116986917
quantile(means, 0.025)
#>           2.5% 
#> -0.06702770832
quantile(means, 0.975)
#>         97.5% 
#> 0.06243307435
Code
dplyr::tibble(x = means) |>
ggplot2::ggplot(ggplot2::aes(x)) +
  ggpattern::geom_histogram_pattern(
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color("white"),
    color = brandr::get_brand_color("gray"),
    fill = "transparent",
    linewidth = 0.5,
    bins = 30
  ) +
  ggplot2::geom_vline(
    xintercept = quantile(means, 0.975),
    color = brandr::get_brand_color("gray"),
    linewidth = 0.5,
    linetype = "dashed"
  ) +
  ggplot2::geom_vline(
    xintercept = quantile(means, 0.025),
    color = brandr::get_brand_color("gray"),
    linewidth = 0.5,
    linetype = "dashed"
  ) +
    ggplot2::geom_vline(
    xintercept = mean(data),
    color = brandr::get_brand_color("primary"),
    linewidth = 2,
    linetype = "solid"
  ) +
  ggplot2::labs(
    x = "Null distribution of the hypothetical mean",
    y = "Frequency"
  )

License

License: CC0-1.0

The content is licensed under CC0 1.0 Universal, placing these materials in the public domain. You may freely copy, modify, distribute, and use this work, even for commercial purposes, without permission or attribution.

Other References

Books

  • Efron & Tibshirani (1993)
  • DeGroot & Schervish (2012)
  • Dekking et al. (2005)

Bootstrap for Dummies

References

Casella, G., & Berger, R. L. (2002). Statistical inference (2nd ed.). Duxbury.
DeGroot, M. H., & Schervish, M. J. (2012). Probability and statistics (OCLC: ocn502674206) (4th ed.). Addison-Wesley.
Dekking, M., Kraaikamp, C., Lopuhaä, H. P., & Meester, L. E. (Eds.). (2005). A modern introduction to probability and statistics: Understanding why and how. Springer. https://doi.org/10.1007/1-84628-168-7
Efron, B. (1979a). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1). https://doi.org/10.1214/aos/1176344552
Efron, B. (1979b). Computers and the theory of statistics: Thinking the unthinkable. SIAM Review, 21(4), 460–480. https://doi.org/10.1137/1021092
Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611970319
Efron, B. (1998). R. A. Fisher in the 21st century. Statistical Science, 13(2), 95–114. https://www.jstor.org/stable/2676745
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Chapman & Hall.
Fife, D. (2025, April 2). What is bootstrapping? [Video recording]. Simplistics (QuantPsych). https://www.youtube.com/watch?v=ADY4k8y41hI
Lehmann, E. L. (1999). Elements of large-sample theory. Springer.
Pascual, C. (2023, August 18). Statistical inception: The bootstrap (#SoME3) [Video recording]. Very Normal. https://www.youtube.com/watch?v=BiNcdYbyiWw
Starmer, J. (2021a, July 6). Bootstrapping main ideas!!! [Video recording]. StatQuest. https://www.youtube.com/watch?v=Xz0x-8-cgaQ
Starmer, J. (2021b, July 13). Using bootstrapping to calculate p-values!!! [Video recording]. StatQuest. https://www.youtube.com/watch?v=N4ZQQqyIf6k
Student. (1908). The probable error of a mean. Biometrika, 6(1). https://doi.org/10.2307/2331554

Footnotes

  1. See Lehmann (1999, Section 6.5) for a most readable introduction.↩︎