split_files_by_size() splits a vector of file paths into chunks based on
their size. It is useful for managing large files or datasets that need to
be processed in smaller parts.
The function groups files into chunks so that the total size of files in each chunk does not exceed the specified limit. If an individual file is larger than the limit, it will be placed in its own chunk. The files are sorted by size in decreasing order before chunking.
Usage
split_files_by_size(
  files,
  max_size = fs::fs_bytes("1GB"),
  order_by_size = TRUE,
  decreasing_size = FALSE,
  root = NULL
)Arguments
- files
 A
charactervector of file paths.- max_size
 (optional) An integer or
fs_bytesvalue specifying the maximum total size (in bytes) allowed for each chunk (default:fs_bytes("1GB")).- order_by_size
 (optional) A
logicalflag indicating whether to sort the files by size before chunking (default:TRUE).- decreasing_size
 (optional) A
logicalflag indicating whether to sort the files in decreasing order of size. This is only relevant iforder_by_sizeisTRUE(default:FALSE).- root
 (optional) A string specifying the root directory of the files. If
NULL, the function will treat the paths as absolute (default:NULL).
Value
A list of character vectors, where each vector
contains file paths that fit within the specified size limit.
See also
Other file functions:
identify_blank_line_neighbors(),
normalize_hashtags(),
normalize_names(),
peek_csv_file(),
remove_blank_line_dups(),
replace_in_file(),
sort_files_by_size(),
split_file(),
zip_files_by_pattern()
Examples
library(fs)
library(readr)
files <- c("file1.txt", "file2.txt", "file3.txt", "file4.txt", "file5.txt")
dir <- tempfile("dir")
dir.create(dir)
for (i in files) {
  write_lines(rep(letters, sample(1000:10000, 1)), file.path(dir, i))
}
files <- sort_files_by_size(files, root = dir)
sizes <- file_size(file.path(dir, files)) |> as.character() |> trimws()
names(sizes) <- files
sizes
#> file2.txt file1.txt file3.txt file5.txt file4.txt 
#>    "148K"    "156K"    "268K"    "301K"    "398K" 
total_size <- file_size(file.path(dir, files)) |> sum()
max_size <- fs::fs_bytes(total_size / 2)
max_size
#> 635K
split_files_by_size(
  files,
  max_size = fs_bytes(total_size / 2),
  root = dir
)
#> [[1]]
#> file2.txt file1.txt file3.txt 
#> 
#> [[2]]
#> file5.txt
#> 
#> [[3]]
#> file4.txt
#>