Skip to contents

[Maturing]

match_strings() performs approximate string matching between two character vectors using the amatch() function from the stringdist package.

The difference between this function and amatch() is that match_strings() returns a tibble with the original strings and their matched counterparts, while amatch() returns only the indices of the matched strings.

Usage

match_strings(raw, reference, one_by_one = FALSE, ...)

Arguments

raw

A character vector with the data to be matched.

reference

A character vector with the reference to match the raw data.

one_by_one

A logical value indicating whether to perform one-by-one matching. If TRUE, the function will match each string in raw to the corresponding string in reference. If FALSE, the function will match all strings in raw to all strings in reference (Default: FALSE).

...

Additional arguments to be passed to amatch().

Value

A tibble with the original strings and their matched counterparts.

See also

Other match functions: find_closest_match_dbl_2()

Examples

raw <- c("sao paulo", "rio de janeiro", "SAO PULO", "RiO de Janiro")
reference <- c("São Paulo", "Rio de Janeiro")
match_strings(raw, reference, maxDist = 1)
#> # A tibble: 4 × 2
#>   key            value         
#>   <chr>          <chr>         
#> 1 sao paulo      São Paulo     
#> 2 rio de janeiro Rio de Janeiro
#> 3 SAO PULO       São Paulo     
#> 4 RiO de Janiro  Rio de Janeiro