match_strings() performs approximate string matching between two
character vectors using the
amatch() function from the stringdist package.
The difference between this function and amatch()
is that match_strings() returns a tibble with the
original strings and their matched counterparts, while
amatch() returns only the indices of the matched
strings.
Arguments
- raw
A
charactervector with the data to be matched.- reference
A
charactervector with the reference to match the raw data.- one_by_one
A
logicalvalue indicating whether to perform one-by-one matching. IfTRUE, the function will match each string inrawto the corresponding string inreference. IfFALSE, the function will match all strings inrawto all strings inreference(default:FALSE).- ...
Additional arguments to be passed to
amatch().
Value
A tibble with the original strings and their
matched counterparts.
See also
Other match functions:
find_closest_match_dbl_2()
Examples
raw <- c("sao paulo", "rio de janeiro", "SAO PULO", "RiO de Janiro")
reference <- c("São Paulo", "Rio de Janeiro")
match_strings(raw, reference, maxDist = 1)
#> # A tibble: 4 × 2
#> key value
#> <chr> <chr>
#> 1 sao paulo São Paulo
#> 2 rio de janeiro Rio de Janeiro
#> 3 SAO PULO São Paulo
#> 4 RiO de Janiro Rio de Janeiro