Bootstrap Illustration

Daniel Vartanian

Bootstrap Illustration

Author

Daniel Vartanian

Published

2025-05-20

Overview

This report contains an illustration of the bootstrap method, originally developed by Bradley Efron (1979a; 1979b, 1982).

Setting the Environment

Code

library(brandr)
library(checkmate)
library(dplyr)
library(ggpattern)
library(ggplot2)
library(infer)
library(magrittr)
library(scales)
library(stats)
library(summarytools)
library(tibble)
library(tidyr)

Setting the Initial Parameters

n <- 1000

mean <- 0

sd <- 1

Theoretical Distribution

We start with a theoretical normal distribution with mean (\(\mu\)) 0 and standard deviation (\(\sigma^{2}\)) 1, representing the theoretical distribution of the population.

Definition 1 (Normal Distribution) The normal distribution has two parameters, usually denoted by \(\mu\) and \(\sigma^{2}\), which are its mean and variance. The pdf [probability density function] of the normal distribution with mean \(\mu\) and variance \(\sigma^{2}\) (usually denoted by \(\text{n}(\mu, \sigma^{2})\)) is given by: (Casella & Berger, 2002, p. 102)

\[ f(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma}} \, e^{-\frac{(x - \mu)^2}{2\sigma^2}}, \quad -\infty < x < \infty \tag{1}\]

Code

normal_dist <- function(x, mean = 0, sd = 1) {
  checkmate::assert_numeric(x)
  checkmate::assert_number(mean)
  checkmate::assert_number(sd)

  (1 / (sqrt(2 * pi * sd))) * exp(1)^(- (x - mean)^2 / (2 * sd^2))
}

Code

dplyr::tibble(
  x = seq(-5, 5, length.out = n),
  y = normal_dist(x, mean, sd)
) |>
  ggplot2::ggplot(ggplot2::aes(x, y)) +
  ggpattern::geom_area_pattern(
    pattern = "stripe",
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color_tint(750, "black"),
    pattern_spacing = 0.015,
    color = brandr::get_brand_color("primary"),
    fill = "transparent",
    linewidth = 2
  ) +
  ggplot2::labs(x = "Theoretical Normal", y = "Density")

Sample

A non-random sample is drawn from a normally distributed population, intentionally biased toward higher extreme values to illustrate the effects of sampling bias.

Definition 2 (Random Sample) The random variables \(X_{1}, \ldots, X_{n}\) are called a random sample of size \(n\) from the population \(f(x)\) if \(X_{1}, \ldots, X_{n}\) are mutually independent random variables and the marginal pdf [probability density function] or pmf of each \(X_{i}\) is the same function \(f(x)\). Alternatively, \(X_{1}, \ldots, X_{n}\) are called independent and identically distributed random variables with pdf or pmf \(f(x)\). This is commonly abbreviated to iid random variables. (Casella & Berger, 2002, p. 207)

Definition 3 (Random Sample) A random sample is a collection of random variables \(X_{1}, X_{2}, \ldots, X_{n}\), that have the same probability distribution and are mutually independent. (Dekking et al., 2005, p. 246)

Population Data

pop_data <- rnorm(n * 100, mean = mean, sd = sd)

Code

pop_data |>
  summarytools::descr() |>
  as.data.frame() |>
  tibble::rownames_to_column("name") |>
  tibble::as_tibble() |>
  dplyr::rename(value = pop_data)

Code

dplyr::tibble(x = pop_data) |>
ggplot2::ggplot(ggplot2::aes(x, ggplot2::after_stat(density))) +
  ggpattern::geom_histogram_pattern(
    pattern = "stripe",
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color_tint(750, "black"),
    pattern_spacing = 0.015,
    color = brandr::get_brand_color("gray"),
    fill = "transparent",
    linewidth = 0.5,
    bins = 30
  ) +
  ggplot2::geom_density(
    color = brandr::get_brand_color("primary"),
    linewidth = 2,
    fill = NA
  ) +
  ggplot2::xlim(-5, 5) +
  ggplot2::labs(
    x = "Population data",
    y = "Density"
  )

Bias Function

Code

bias_fun <- function(x, shape_1 = 0.45, shape_2 = 0.5, max_rescale = 0.95) {
  checkmate::assert_numeric(x)
  checkmate::assert_number(shape_1)
  checkmate::assert_number(shape_2)
  checkmate::assert_number(max_rescale, lower = 0.01, upper = 1)

  x <- scales::rescale(x, to = c(0, max_rescale))

  dplyr::if_else(
    x <= 0.5,
    dbeta(0.5, shape1 = shape_1, shape2 = shape_2),
    dbeta(x, shape1 = shape_1, shape2 = shape_2)
  )
}

Code

dplyr::tibble(
  x = seq(0, 1, length.out = n),
  y = bias_fun(x)
) |>
  ggplot2::ggplot(ggplot2::aes(x, y)) +
  ggpattern::geom_area_pattern(
    pattern = "stripe",
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color_tint(750, "black"),
    pattern_spacing = 0.015,
    color = brandr::get_brand_color("primary"),
    fill = "transparent",
    linewidth = 2
  ) +
  ggplot2::labs(x = "Quantiles", y = "Probability weight")

Sample Data

data <-
  pop_data |>
  sort() |>
  sample(
    n,
    replace = FALSE,
    prob =
      seq(0, 1, length.out = length(pop_data)) |>
      bias_fun()
  )

Code

data |>
  summarytools::descr() |>
  as.data.frame() |>
  tibble::rownames_to_column("name") |>
  tibble::as_tibble() |>
  dplyr::rename(value = data)

Code

dplyr::tibble(x = data) |>
ggplot2::ggplot(ggplot2::aes(x, ggplot2::after_stat(density))) +
  ggpattern::geom_histogram_pattern(
    pattern = "stripe",
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color_tint(750, "black"),
    pattern_spacing = 0.015,
    color = brandr::get_brand_color("gray"),
    fill = "transparent",
    linewidth = 0.5,
    bins = 30
  ) +
  ggplot2::geom_density(
    color = brandr::get_brand_color("primary"),
    linewidth = 2,
    fill = NA
  ) +
  ggplot2::xlim(-5, 5) +
  ggplot2::labs(
    x = "Sample data",
    y = "Density"
  )

Bootstrap-Based t-Test

Finally, we apply the bootstrap method to estimate a confidence interval for the sample mean and conduct a t-test (Student, 1908), treating the sample mean as an estimate of the population mean.

We compare these bootstrap-based results to those from the traditional theory-based t-test, which relies on the assumption that the sample is drawn from a normally distributed population.

The bootstrap is based on a simple, yet powerful, idea (whose mathematics can get quite involved)¹. In statistics, we learn about the characteristics of the population by taking samples. As the sample represents the population, analogous characteristics of the sample should give us information about the population characteristics. The bootstrap helps us learn about the sample characteristics by taking resamples (that is, we retake samples from the original sample) and use this information to infer to the population. The bootstrap was developed by Efron in the late 1970s, with the original ideas appearing in Efron (1979a; 1979b) and the monograph by Efron (1982). See also Efron (1998) for more recent thoughts and developments. (Casella & Berger, 2002, p. 478)

In Example 1.2.20 we calculated all possible averages of four numbers selected from 2, 4, 9, 12, where we drew the numbers with replacement. This is the simplest form of the bootstrap, sometimes referred to as the nonparametric bootstrap. (Casella & Berger, 2002, p. 478)

This kind of sampling is called with replacement because the value chosen at any stage is “replaced” in the population and is available for choice again at the next stage. (Casella & Berger, 2002, p. 209)

Theory-Based t-Test (Base R)

\[ \begin{cases} \text{H}_{0}: \mu = 0 \\ \text{H}_{a}: \mu \neq 0 \\ \end{cases} \]

data |>
  stats::t.test(
    alternative = "two.sided",
    conf.level = 0.95,
    mu = mean(pop_data)
  )
#> 
#>  One Sample t-test
#> 
#> data:  data
#> t = 5.5329682, df = 999, p-value = 0.00000004021649
#> alternative hypothesis: true mean is not equal to 0.000324767383
#> 95 percent confidence interval:
#>  0.1242178757 0.2603959763
#> sample estimates:
#>   mean of x 
#> 0.192306926

Theory-Based t-Test (`infer`)

dplyr::tibble(x = data) |>
  infer::t_test(
    response = x,
    alternative = "two.sided",
    mu = mean(pop_data),
    conf.level = 0.95
  ) |>
  dplyr::mutate(dplyr::across(dplyr::everything(), as.character)) |>
  tidyr::pivot_longer(dplyr::everything())

Bootstrap Sample Mean CI (`infer`)

observed_statistic <-
  dplyr::tibble(x = data) |>
  infer::specify(response = x) |>
  infer::calculate(stat = "mean")

null_dist <-
  dplyr::tibble(x = data) |>
  infer::specify(response = x) |>
  infer::generate(reps = n, type = "bootstrap") |>
  infer::calculate(stat = "mean")

Code

null_dist |>
  infer::get_confidence_interval(
    level = 0.95,
    point_estimate = observed_statistic
  )

Bootstrap-Based t-Test (`infer`)

observed_statistic <-
  dplyr::tibble(x = data) |>
  infer::specify(response = x) |>
  infer::calculate(stat = "mean")

null_dist <-
  dplyr::tibble(x = data) |>
  infer::specify(response = x) |>
  infer::hypothesize(null = "point", mu = mean(pop_data)) |>
  infer::generate(reps = n, type = "bootstrap") |>
  infer::calculate(stat = "mean")

Code

ci <- null_dist |>
  infer::get_confidence_interval(
    level = 0.95,
    point_estimate = observed_statistic
  )

ci

Code

null_dist |>
  infer::get_p_value(
    obs_stat = observed_statistic,
    direction = "two.sided"
  )
#> Warning: Please be cautious in reporting a p-value of 0. This result is an
#> approximation based on the number of `reps` chosen in the `generate()` step.
#> ℹ See `get_p_value()` (`?infer::get_p_value()`) for more information.

Code

null_dist |>
  infer::visualize(bins = 30) +
  infer::shade_p_value(
    obs_stat = observed_statistic,
    direction = "two-sided",
    color = brandr::get_brand_color("primary"),
    fill = brandr::get_brand_color("light-orange")
  ) +
  ggplot2::geom_vline(
    xintercept = ci$lower_ci,
    color = brandr::get_brand_color("gray"),
    linewidth = 0.5,
    linetype = "dashed"
  ) +
  ggplot2::geom_vline(
    xintercept = ci$upper_ci,
    color = brandr::get_brand_color("gray"),
    linewidth = 0.5,
    linetype = "dashed"
  ) +
  ggplot2::labs(
    title = NULL,
    x = "Null distribution of the hypothetical mean",
    y = "Frequency"
  )

Bootstrap Sample Mean CI (Independent)

means <- vapply(
  X = seq_len(n),
  FUN = function(x) {
    data |>
      sample(n, replace = TRUE) |>
      mean()
  },
  FUN.VALUE = numeric(1)
)

mean(means)
#> [1] 0.1919669725

quantile(means, 0.025)
#>         2.5% 
#> 0.1267206149

quantile(means, 0.975)
#>        97.5% 
#> 0.2608197494

Bootstrap-Based t-Test (Independent)

means <- vapply(
  X = seq_len(n),
  FUN = function(x) {
    data |>
      sample(n, replace = TRUE) |>
      mean() |>
      magrittr::subtract(mean(data) + mean(pop_data))
  },
  FUN.VALUE = numeric(1)
)

mean(means)
#> [1] -0.0007116986917

quantile(means, 0.025)
#>           2.5% 
#> -0.06702770832

quantile(means, 0.975)
#>         97.5% 
#> 0.06243307435

Code

dplyr::tibble(x = means) |>
ggplot2::ggplot(ggplot2::aes(x)) +
  ggpattern::geom_histogram_pattern(
    pattern_color = "transparent",
    pattern_fill = brandr::get_brand_color("white"),
    color = brandr::get_brand_color("gray"),
    fill = "transparent",
    linewidth = 0.5,
    bins = 30
  ) +
  ggplot2::geom_vline(
    xintercept = quantile(means, 0.975),
    color = brandr::get_brand_color("gray"),
    linewidth = 0.5,
    linetype = "dashed"
  ) +
  ggplot2::geom_vline(
    xintercept = quantile(means, 0.025),
    color = brandr::get_brand_color("gray"),
    linewidth = 0.5,
    linetype = "dashed"
  ) +
    ggplot2::geom_vline(
    xintercept = mean(data),
    color = brandr::get_brand_color("primary"),
    linewidth = 2,
    linetype = "solid"
  ) +
  ggplot2::labs(
    x = "Null distribution of the hypothetical mean",
    y = "Frequency"
  )

License

The content is licensed under CC0 1.0 Universal, placing these materials in the public domain. You may freely copy, modify, distribute, and use this work, even for commercial purposes, without permission or attribution.

Other References

Books

Efron & Tibshirani (1993)
DeGroot & Schervish (2012)
Dekking et al. (2005)

Bootstrap for Dummies

Fife (2025)
Starmer (2021a)
Starmer (2021b)
Pascual (2023)

References

Casella, G., & Berger, R. L. (2002). Statistical inference (2nd ed.). Duxbury.

DeGroot, M. H., & Schervish, M. J. (2012). Probability and statistics (OCLC: ocn502674206) (4th ed.). Addison-Wesley.

Dekking, M., Kraaikamp, C., Lopuhaä, H. P., & Meester, L. E. (Eds.). (2005). A modern introduction to probability and statistics: Understanding why and how. Springer. https://doi.org/10.1007/1-84628-168-7

Efron, B. (1979a). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1). https://doi.org/10.1214/aos/1176344552

Efron, B. (1979b). Computers and the theory of statistics: Thinking the unthinkable. SIAM Review, 21(4), 460–480. https://doi.org/10.1137/1021092

Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611970319

Efron, B. (1998). R. A. Fisher in the 21st century. Statistical Science, 13(2), 95–114. https://www.jstor.org/stable/2676745

Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. Chapman & Hall.

Fife, D. (2025, April 2). What is bootstrapping? [Video recording]. Simplistics (QuantPsych). https://www.youtube.com/watch?v=ADY4k8y41hI

Lehmann, E. L. (1999). Elements of large-sample theory. Springer.

Pascual, C. (2023, August 18). Statistical inception: The bootstrap (#SoME3) [Video recording]. Very Normal. https://www.youtube.com/watch?v=BiNcdYbyiWw

Starmer, J. (2021a, July 6). Bootstrapping main ideas!!! [Video recording]. StatQuest. https://www.youtube.com/watch?v=Xz0x-8-cgaQ

Starmer, J. (2021b, July 13). Using bootstrapping to calculate p-values!!! [Video recording]. StatQuest. https://www.youtube.com/watch?v=N4ZQQqyIf6k

Student. (1908). The probable error of a mean. Biometrika, 6(1). https://doi.org/10.2307/2331554

Footnotes

See Lehmann (1999, Section 6.5) for a most readable introduction.↩︎

Overview

Setting the Environment

Setting the Initial Parameters

Theoretical Distribution

Sample

Population Data

Bias Function

Sample Data

Bootstrap-Based t-Test

Theory-Based t-Test (Base R)

Theory-Based t-Test (infer)

Bootstrap Sample Mean CI (infer)

Bootstrap-Based t-Test (infer)

Bootstrap Sample Mean CI (Independent)

Bootstrap-Based t-Test (Independent)

License

Other References

Books

Bootstrap for Dummies

References

Footnotes

Theory-Based t-Test (`infer`)

Bootstrap Sample Mean CI (`infer`)

Bootstrap-Based t-Test (`infer`)