split_files_by_size()
splits a vector of file paths into chunks based on
their size. It is useful for managing large files or datasets that need to
be processed in smaller parts.
The function groups files into chunks so that the total size of files in each chunk does not exceed the specified limit. If an individual file is larger than the limit, it will be placed in its own chunk. The files are sorted by size in decreasing order before chunking.
Usage
split_files_by_size(
files,
max_size = fs::fs_bytes("1GB"),
order_by_size = TRUE,
decreasing_size = FALSE,
root = NULL
)
Arguments
- files
A
character
vector of file paths.- max_size
(optional) An integer or
fs_bytes
value specifying the maximum total size (in bytes) allowed for each chunk (default:fs_bytes("1GB")
).- order_by_size
(optional) A
logical
flag indicating whether to sort the files by size before chunking (default:TRUE
).- decreasing_size
(optional) A
logical
flag indicating whether to sort the files in decreasing order of size. This is only relevant iforder_by_size
isTRUE
(default:FALSE
).- root
(optional) A string specifying the root directory of the files. If
NULL
, the function will treat the paths as absolute (default:NULL
).
Value
A list
of character vectors, where each vector
contains file paths that fit within the specified size limit.
See also
Other file functions:
identify_blank_line_neighbors()
,
normalize_hashtags()
,
normalize_names()
,
peek_csv_file()
,
remove_blank_line_dups()
,
replace_in_file()
,
sort_files_by_size()
,
split_file()
,
zip_files_by_pattern()
Examples
library(fs)
library(readr)
files <- c("file1.txt", "file2.txt", "file3.txt", "file4.txt", "file5.txt")
dir <- tempfile("dir")
dir.create(dir)
for (i in files) {
write_lines(rep(letters, sample(1000:10000, 1)), file.path(dir, i))
}
files <- sort_files_by_size(files, root = dir)
sizes <- file_size(file.path(dir, files)) |> as.character() |> trimws()
names(sizes) <- files
sizes
#> file1.txt file3.txt file5.txt file4.txt file2.txt
#> "126K" "165K" "263K" "285K" "465K"
total_size <- file_size(file.path(dir, files)) |> sum()
max_size <- fs::fs_bytes(total_size / 2)
max_size
#> 652K
split_files_by_size(
files,
max_size = fs_bytes(total_size / 2),
root = dir
)
#> [[1]]
#> file1.txt file3.txt file5.txt
#>
#> [[2]]
#> file4.txt
#>
#> [[3]]
#> file2.txt
#>