Map terms from a corpus and compute its word frequency

word_frequency() map terms from a vector using the tm package system and outputs a tibble summarizing word frequencies.

Please note that word_frequency() is required for other iramuteqlike functions.

word_frequency(
  x,
  language = "en",
  stopwords = language,
  other_stopwords = NULL
)

Arguments

x: An atomic vector (usually a character object) with a corpus (a collection of documents containing text). Each element of x will represent a document (e.g., a response from a survey).
language: (optional) a string indicating the corpus language. See tm::SimpleCorpus() to learn more (default: "en").
stopwords: (optional) a string indicating the corpus language to be pass to tm::stopwords(). tm::stopwords() return various kinds of words related to a specific language to help clean a text analysis (default: "en").
other_stopwords: (optional) a string indicating other stop words to be removed from the text analysis.

Value

A tibble object with two columns:

word: with an unique set of words mapped from x.
freq: with the absolute frequency of a word in word.

Examples

if (requireNamespace("friends", quietly = TRUE)) {
    word_frequency(head(friends::friends[[1]], 100))
}
#> # A tibble: 348 × 2
#>    word    freq
#>    <chr>  <dbl>
#>  1 just      19
#>  2 youre     10
#>  3 dont      10
#>  4 okay       8
#>  5 guy        7
#>  6 well       7
#>  7 paul       7
#>  8 like       6
#>  9 really     6
#> 10 theres     5
#> # ℹ 338 more rows

if (requireNamespace("stringi", quietly = TRUE)) {
    word_frequency(stringi::stri_rand_lipsum(5))
}
#> # A tibble: 156 × 2
#>    word          freq
#>    <chr>        <dbl>
#>  1 sed             16
#>  2 amet             8
#>  3 nisl             8
#>  4 non              8
#>  5 pellentesque     8
#>  6 diam             8
#>  7 nulla            8
#>  8 mauris           7
#>  9 nec              7
#> 10 turpis           6
#> # ℹ 146 more rows