Assess Dictionary Categories Within a Latent Semantic Space
Source:R/dictionary_meta.R
dictionary_meta.Rd
Assess Dictionary Categories Within a Latent Semantic Space
Usage
dictionary_meta(dict, space = "auto", n_spaces = 5, suggest = FALSE,
suggestion_terms = 10, suggest_stopwords = FALSE,
suggest_discriminate = TRUE, expand_cutoff_freq = 0.98,
expand_cutoff_spaces = 10, dimension_prop = 1, pairwise = TRUE,
glob = TRUE, space_dir = getOption("lingmatch.lspace.dir"),
verbose = TRUE)
Arguments
- dict
A vector of terms, list of such vectors, or a matrix-like object to be categorized by
read.dic
.- space
A vector space used to calculate similarities between terms. Names of spaces (see
select.lspace
), a matrix with terms as row names, or"auto"
to auto-select a space based on matched terms. This can also bemulti
to use multiple spaces, which are combined after similarities are calculated.- n_spaces
Number of spaces to draw from if
space
ismulti
.- suggest
Logical; if
TRUE
, will search for other terms for possible inclusion inspace
.- suggestion_terms
Number of terms to use when selecting suggested additions.
- suggest_stopwords
Logical; if
TRUE
, will suggest function words.- suggest_discriminate
Logical; if
TRUE
, will adjust for similarity to other categories when finding suggestions.- expand_cutoff_freq
Proportion of mapped terms to include when expanding dictionary terms. Applies when
space
is a character (referring to a space to be loaded).- expand_cutoff_spaces
Number of spaces in which a term has to appear to be considered for expansion. Applies when
space
is a character (referring to a space to be loaded).- dimension_prop
Proportion of dimensions to use when searching for suggested additions, where less than 1 will calculate similarities to the category core using fewer dimensions of the space.
- pairwise
Logical; if
FALSE
, will compare candidate suggestion terms with a single, averaged category vector rather than all category terms separately.- glob
Logical; if
TRUE
, converts globs (asterisk wildcards) to regular expressions.- space_dir
Directory from which
space
should be loaded.- verbose
Logical; if
FALSE
, will not show status messages.
Value
A list:
expanded
: A version ofdict
with fuzzy terms expanded.summary
: A summary of each dictionary category.terms
: Match (expanded term) similarities within terms and categories.suggested
: Ifsuggest
isTRUE
, a list with suggested additions for each dictionary category. Each entry is a named numeric vector with similarities for each suggested term.
See also
To just expand fuzzy terms, see report_term_matches()
.
Similar information is provided in the dictionary builder web tool.
Other Dictionary functions:
download.dict()
,
lma_patcat()
,
lma_termcat()
,
read.dic()
,
report_term_matches()
,
select.dict()
Examples
if (dir.exists("~/Latent Semantic Spaces")) {
dict <- list(
furniture = c("table", "chair", "desk*", "couch*", "sofa*"),
well_adjusted = c("happy", "bright*", "friend*", "she", "he", "they")
)
dictionary_meta(dict, space_dir = "~/Latent Semantic Spaces")
}
#> preparing terms (0)
#> expanding terms (2.69)
#> loading space (2.84)
#> calculating term similarities (17.2)
#> preparing results (17.2)
#> done (17.2)
#> $expanded
#> $expanded$furniture
#> [1] "table" "chair" "desk-top" "desk" "desking"
#> [6] "deskpro" "deskilled" "desktop" "desktops" "desks"
#> [11] "deskjet" "deskbound" "deskins" "deskilling" "desker"
#> [16] "deskside" "deskstar" "couchdb" "couchant" "couchsurfing"
#> [21] "couche" "couchette" "couchman" "couching" "couched"
#> [26] "coucher" "couches" "couch" "sofaer" "sofabed"
#> [31] "sofala" "sofas" "sofar" "sofa"
#>
#> $expanded$well_adjusted
#> [1] "happy" "bright-eyed" "brightmail" "brightcove"
#> [5] "brightling" "brightnesses" "brightwater" "brightpoint"
#> [9] "brightstar" "brightwell" "brighter" "brighton"
#> [13] "brightfield" "brightest" "brightwork" "brighten"
#> [17] "brightside" "brightly" "brightness" "brightening"
#> [21] "brightened" "brightwood" "brighthouse" "brightman"
#> [25] "brightlingsea" "bright" "brights" "brightens"
#> [29] "friendz" "friendlies" "friend" "friendly"
#> [33] "friendfeed" "friendswood" "friendliness" "friendster"
#> [37] "friendship" "friends" "friendlier" "friendliest"
#> [41] "friendless" "friendships" "friended" "friending"
#> [45] "she" "he" "they"
#>
#>
#> $summary
#> category n_terms n_expanded sim.space sim.min
#> furniture furniture 5 34 glove_crawl -0.03352657
#> well_adjusted well_adjusted 6 47 glove_crawl -0.01389545
#> sim.q1 sim.median sim.mean sim.q3 sim.max
#> furniture 0.01484553 0.03518088 0.05212709 0.08518134 0.1520343
#> well_adjusted 0.01743533 0.07671837 0.08098433 0.13679356 0.1828150
#>
#> $terms
#> category term match sim.term sim.category
#> 1 furniture table table 1.000000e+00 0.520288056
#> 2 furniture chair chair 1.000000e+00 0.643952092
#> 3 furniture desk* desk-top 2.175893e-01 0.019619600
#> 3.1 furniture desk* desk 1.000000e+00 0.543019756
#> 3.2 furniture desk* desking 2.288528e-01 0.214574317
#> 3.3 furniture desk* deskpro 6.674267e-02 -0.029996732
#> 3.4 furniture desk* deskilled -4.967378e-02 -0.139120978
#> 3.5 furniture desk* desktop 4.811437e-01 0.196892375
#> 3.6 furniture desk* desktops 2.660376e-01 0.120308960
#> 3.7 furniture desk* desks 7.055135e-01 0.488119728
#> 3.8 furniture desk* deskjet 6.666372e-02 0.014096537
#> 3.9 furniture desk* deskbound -3.295457e-02 0.040480064
#> 3.10 furniture desk* deskins -7.416633e-02 -0.093441679
#> 3.11 furniture desk* deskilling -1.111884e-01 -0.127046688
#> 3.12 furniture desk* desker -8.782659e-02 -0.101063485
#> 3.13 furniture desk* deskside 1.018355e-01 -0.018340077
#> 3.14 furniture desk* deskstar 3.228949e-02 -0.077787116
#> 4 furniture couch* couchdb 8.856923e-02 0.047978957
#> 4.1 furniture couch* couchant 4.002281e-02 0.002155885
#> 4.2 furniture couch* couchsurfing 1.159810e-01 0.046172189
#> 4.3 furniture couch* couche 5.873038e-02 0.043086733
#> 4.4 furniture couch* couchette 1.557117e-01 0.138566945
#> 4.5 furniture couch* couchman 3.607044e-03 0.020909713
#> 4.6 furniture couch* couching 9.778941e-02 0.025603836
#> 4.7 furniture couch* couched 8.911655e-03 -0.023141308
#> 4.8 furniture couch* coucher 5.448386e-02 0.136451238
#> 4.9 furniture couch* couches 6.149996e-01 0.642776428
#> 4.10 furniture couch* couch 1.000000e+00 0.777768308
#> 5 furniture sofa* sofaer -1.746807e-01 -0.174680728
#> 5.1 furniture sofa* sofabed 4.930534e-01 0.493053388
#> 5.2 furniture sofa* sofala -1.683102e-02 -0.016831021
#> 5.3 furniture sofa* sofas 7.376998e-01 0.737699759
#> 5.4 furniture sofa* sofar -9.499185e-02 -0.094991854
#> 5.5 furniture sofa* sofa 1.000000e+00 1.000000000
#> 6 well_adjusted happy happy 1.000000e+00 0.668987316
#> 7 well_adjusted bright* bright-eyed 2.423866e-01 0.007577284
#> 7.1 well_adjusted bright* brightmail -7.953414e-02 -0.103623466
#> 7.2 well_adjusted bright* brightcove -2.638364e-02 -0.020390810
#> 7.3 well_adjusted bright* brightling -4.155437e-02 -0.049855680
#> 7.4 well_adjusted bright* brightnesses 9.780009e-02 -0.107159578
#> 7.5 well_adjusted bright* brightwater -1.177716e-01 -0.073448621
#> 7.6 well_adjusted bright* brightpoint -1.143870e-01 -0.124154393
#> 7.7 well_adjusted bright* brightstar 4.630735e-02 -0.071935486
#> 7.8 well_adjusted bright* brightwell -3.234350e-03 -0.043479127
#> 7.9 well_adjusted bright* brighter 6.335578e-01 0.203012712
#> 7.10 well_adjusted bright* brighton 2.579316e-01 0.319441300
#> 7.11 well_adjusted bright* brightfield 4.749668e-02 -0.059751944
#> 7.12 well_adjusted bright* brightest 5.378596e-01 0.223565471
#> 7.13 well_adjusted bright* brightwork -2.986911e-02 -0.020519883
#> 7.14 well_adjusted bright* brighten 4.917114e-01 0.258820016
#> 7.15 well_adjusted bright* brightside 8.110498e-02 0.011232105
#> 7.16 well_adjusted bright* brightly 7.098957e-01 0.173942116
#> 7.17 well_adjusted bright* brightness 5.076152e-01 0.113206864
#> 7.18 well_adjusted bright* brightening 2.915817e-01 0.048249708
#> 7.19 well_adjusted bright* brightened 3.989190e-01 0.144605826
#> 7.20 well_adjusted bright* brightwood -5.574753e-02 -0.023598381
#> 7.21 well_adjusted bright* brighthouse 1.314844e-02 0.001678741
#> 7.22 well_adjusted bright* brightman -7.511471e-03 0.064558972
#> 7.23 well_adjusted bright* brightlingsea -9.688023e-02 -0.031468955
#> 7.24 well_adjusted bright* bright 1.000000e+00 0.357651252
#> 7.25 well_adjusted bright* brights 3.541581e-01 0.092125277
#> 7.26 well_adjusted bright* brightens 3.503756e-01 0.066159802
#> 8 well_adjusted friend* friendz 1.395804e-01 0.224113174
#> 8.1 well_adjusted friend* friendlies -5.471498e-05 0.135094430
#> 8.2 well_adjusted friend* friend 1.000000e+00 0.815147799
#> 8.3 well_adjusted friend* friendly 4.840347e-01 0.518355797
#> 8.4 well_adjusted friend* friendfeed 1.858389e-01 0.252301316
#> 8.5 well_adjusted friend* friendswood 4.390444e-04 0.043916231
#> 8.6 well_adjusted friend* friendliness 1.435103e-01 0.207111231
#> 8.7 well_adjusted friend* friendster 2.560383e-01 0.302085908
#> 8.8 well_adjusted friend* friendship 5.350424e-01 0.613310672
#> 8.9 well_adjusted friend* friends 8.151478e-01 1.000000000
#> 8.10 well_adjusted friend* friendlier 1.157377e-01 0.184342733
#> 8.11 well_adjusted friend* friendliest 1.380033e-01 0.215251090
#> 8.12 well_adjusted friend* friendless 1.296303e-01 0.162746245
#> 8.13 well_adjusted friend* friendships 3.297935e-01 0.518973621
#> 8.14 well_adjusted friend* friended 1.958729e-01 0.259071536
#> 8.15 well_adjusted friend* friending 1.462934e-01 0.225425703
#> 9 well_adjusted she she 1.000000e+00 0.565640744
#> 10 well_adjusted he he 1.000000e+00 0.534147512
#> 11 well_adjusted they they 1.000000e+00 0.607023614
#>
#> $suggested
#> NULL
#>