word_frequency()
map terms from a vector using the tm
package system and outputs a tibble summarizing word
frequencies.
Please note that word_frequency()
is required for other iramuteqlike
functions.
word_frequency(
x,
language = "en",
stopwords = language,
other_stopwords = NULL
)
An atomic vector (usually a
character object) with a corpus (a collection of documents
containing text). Each element of x
will represent a document (e.g., a
response from a survey).
(optional) a string indicating the corpus language. See
tm::SimpleCorpus()
to learn more (default: "en"
).
(optional) a string indicating the corpus language to be
pass to tm::stopwords()
. tm::stopwords()
return various kinds of words
related to a specific language to help clean a text analysis (default:
"en"
).
(optional) a string indicating other stop words to be removed from the text analysis.
A tibble
object with two columns:
word
: with an unique set of words mapped from x
.
freq
: with the absolute frequency of a word in word
.
if (requireNamespace("friends", quietly = TRUE)) {
word_frequency(head(friends::friends[[1]], 100))
}
#> # A tibble: 348 × 2
#> word freq
#> <chr> <dbl>
#> 1 just 19
#> 2 youre 10
#> 3 dont 10
#> 4 okay 8
#> 5 guy 7
#> 6 well 7
#> 7 paul 7
#> 8 like 6
#> 9 really 6
#> 10 theres 5
#> # ℹ 338 more rows
if (requireNamespace("stringi", quietly = TRUE)) {
word_frequency(stringi::stri_rand_lipsum(5))
}
#> # A tibble: 156 × 2
#> word freq
#> <chr> <dbl>
#> 1 sed 16
#> 2 amet 8
#> 3 nisl 8
#> 4 non 8
#> 5 pellentesque 8
#> 6 diam 8
#> 7 nulla 8
#> 8 mauris 7
#> 9 nec 7
#> 10 turpis 6
#> # ℹ 146 more rows