Skip to contents

Read in or write dictionary files in Comma-Separated Values (.csv; weighted) or Linguistic Inquiry and Word Count (.dic; non-weighted) format.


read.dic(path, cats = NULL, type = "asis", as.weighted = FALSE,
  dir = getOption("lingmatch.dict.dir"), ..., = "term", = "category", raw = FALSE)

write.dic(dict, filename = NULL, type = "asis", as.weighted = FALSE,
  save = TRUE)



Path to a file, a name corresponding to a file in getOption('lingmatch.dict.dir') (or '~/Dictionaries') or one of the dictionaries available at, a matrix-like object to be categorized, or a list to be formatted.


A character vector of category names to be returned. All categories are returned by default.


A character indicating whether and how terms should be altered. Unspecified or matching 'asis' leaves terms as they are. Other options change wildcards to regular expressions: 'pattern' ('^[poi]') replaces initial asterisks with '\\b\\w*', and terminal asterisks with '\\w*\\b', to match terms within raw text; for anything else, terms are padded with ^ and $, then those bounding marks are removed when an asterisk is present, to match tokenized terms.


Logical; if TRUE, prevents weighted dictionaries from being converted to unweighted versions, or converts unweighted dictionaries to a binary weighted version -- a data.frame with a "term" column of unique terms, and a column for each category.


Path to a folder containing dictionaries, or where you would like dictionaries to be downloaded; passed to select.dict and/or download.dict.


Passes arguments to readLines.,

Strings identifying column names in path containing terms and categories respectively.


Logical or a character. As logical, indicates if path should be treated as a raw dictionary (as might be read in from a .dic file). As a character, replaces path as if it were read in from a file.


A list with a named entry of terms for each category, or a data.frame with terms in one column, and categories or weights in the rest.


The name of the file to be saved.


Logical: if FALSE, does not write a file.


read.dic: A list (unweighted) with an entry for each category containing character vectors of terms, or a data.frame (weighted) with columns for terms (first, "term") and weights (all subsequent, with category labels as names).

write.dic: A version of the written dictionary -- a raw character vector for unweighted dictionaries, or a data.frame for weighted dictionaries.

See also


# make a small murder related dictionary
dict <- list(
  kill = c("kill*", "murd*", "wound*", "die*"),
  death = c("death*", "dying", "die*", "kill*")

# convert it to a weighted format
(dict_weighted <- read.dic(dict, as.weighted = TRUE))
#>     term kill death
#> 1  kill*    1     1
#> 2  murd*    1     0
#> 3 wound*    1     0
#> 4   die*    1     1
#> 5 death*    0     1
#> 6  dying    0     1

# categorize it back
#> $kill
#> [1] "kill*"  "murd*"  "wound*" "die*"  
#> $death
#> [1] "kill*"  "die*"   "death*" "dying" 

# convert it to a string without writing to a file
cat(raw_dict <- write.dic(dict, save = FALSE))
#> %
#> 1	kill
#> 2	death
#> %
#> kill*	1	2
#> murd*	1
#> wound*	1
#> die*	1	2
#> death*	2
#> dying	2

# parse it back in
read.dic(raw = raw_dict)
#> $kill
#> [1] "kill*"  "murd*"  "wound*" "die*"  
#> $death
#> [1] "kill*"  "die*"   "death*" "dying" 

if (FALSE) {

# save it as a .dic file
write.dic(dict, "murder")

# read it back in as a list

# read in the Moral Foundations or LUSI dictionaries from urls
moral_dict <- read.dic("")
lusi_dict <- read.dic("")

# save and read in a version of the General Inquirer dictionary
inquirer <- read.dic("inquirer", dir = "~/Dictionaries")