match_strings()
performs approximate string matching between two
character
vectors using the
amatch()
function from the stringdist
package.
The difference between this function and amatch()
is that match_strings()
returns a tibble
with the
original strings and their matched counterparts, while
amatch()
returns only the indices of the matched
strings.
Arguments
- raw
A
character
vector with the data to be matched.- reference
A
character
vector with the reference to match the raw data.- one_by_one
A
logical
value indicating whether to perform one-by-one matching. IfTRUE
, the function will match each string inraw
to the corresponding string inreference
. IfFALSE
, the function will match all strings inraw
to all strings inreference
(Default:FALSE
).- ...
Additional arguments to be passed to
amatch()
.
Value
A tibble
with the original strings and their
matched counterparts.
See also
Other match functions:
find_closest_match_dbl_2()
Examples
raw <- c("sao paulo", "rio de janeiro", "SAO PULO", "RiO de Janiro")
reference <- c("São Paulo", "Rio de Janeiro")
match_strings(raw, reference, maxDist = 1)
#> # A tibble: 4 × 2
#> key value
#> <chr> <chr>
#> 1 sao paulo São Paulo
#> 2 rio de janeiro Rio de Janeiro
#> 3 SAO PULO São Paulo
#> 4 RiO de Janiro Rio de Janeiro