A wrapper to other pre-processing functions, potentially from read.segments
, to lma_dtm
or lma_patcat
, to lma_weight
, then lma_termcat
or lma_lspace
,
and optionally including lma_meta
output.
Arguments
- input
A vector of text, or path to a text file or folder.
- ...
arguments to be passed to
lma_dtm
,lma_patcat
,lma_weight
,lma_termcat
, and/orlma_lspace
. All arguments must be named.- meta
Logical; if
FALSE
, metastatistics are not included. Only applies when raw text is available. If included, meta categories are added as the last columns, with names starting with "meta_".- coverage
Logical; if
TRUE
and a dictionary is provided (dict
), will calculate the coverage (number of unique term matches) of each dictionary category.
Value
A matrix with texts represented by rows, and features in columns, unless there are multiple rows per output (e.g., when a latent semantic space is applied without terms being mapped) in which case only the special output is returned (e.g., a matrix with terms as rows and latent dimensions in columns).
See also
If you just want to compare texts, see the lingmatch()
function.
Examples
# starting with some texts in a vector
texts <- c(
"Firstly, I would like to say, and with all due respect...",
"Please, proceed. I hope you feel you can speak freely...",
"Oh, of course, I just hope to be clear, and not cause offense...",
"Oh, no, don't monitor yourself on my account..."
)
# by default, term counts and metastatistics are returned
lma_process(texts)
#> text account all
#> 1 Firstly, I would like to say, and with all due respect... 0 1
#> 2 Please, proceed. I hope you feel you can speak freely... 0 0
#> 3 Oh, of course, I just hope to be clear, and not cause offense... 0 0
#> 4 Oh, no, don't monitor yourself on my account... 1 0
#> and be can cause clear course don't due feel firstly freely hope i just like
#> 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1
#> 2 0 0 1 0 0 0 0 0 1 0 1 1 1 0 0
#> 3 1 1 0 1 1 1 0 0 0 0 0 1 1 1 0
#> 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
#> monitor my no not of offense oh on please proceed respect say speak to with
#> 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1
#> 2 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0
#> 3 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0
#> 4 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0
#> would you yourself meta_characters meta_syllables meta_words
#> 1 1 0 0 42 12 11
#> 2 0 2 0 42 11 10
#> 3 0 0 0 46 14 13
#> 4 0 0 1 35 12 8
#> meta_unique_words meta_clauses meta_sentences meta_words_per_clause
#> 1 11 3 1 3.666667
#> 2 9 3 2 3.333333
#> 3 13 4 1 3.250000
#> 4 8 3 1 2.666667
#> meta_words_per_sentence meta_sixltr meta_characters_per_word
#> 1 11 2 3.818182
#> 2 5 3 4.200000
#> 3 13 2 3.538462
#> 4 8 3 4.375000
#> meta_syllables_per_word meta_type_token_ratio meta_reading_grade meta_numbers
#> 1 1.090909 1.0 1.572727 0
#> 2 1.100000 0.9 -0.660000 0
#> 3 1.076923 1.0 2.187692 0
#> 4 1.500000 1.0 5.230000 0
#> meta_puncts meta_periods meta_commas meta_qmarks meta_exclams meta_quotes
#> 1 5 3 2 0 0 0
#> 2 5 4 1 0 0 0
#> 3 6 3 3 0 0 0
#> 4 5 3 2 0 0 0
#> meta_apostrophes meta_brackets meta_orgmarks
#> 1 0 0 0
#> 2 0 0 0
#> 3 0 0 0
#> 4 1 0 0
# add dictionary and percent arguments for standard dictionary-based results
lma_process(texts, dict = lma_dict(), percent = TRUE)
#> text ppron
#> 1 Firstly, I would like to say, and with all due respect... 9.090909
#> 2 Please, proceed. I hope you feel you can speak freely... 30.000000
#> 3 Oh, of course, I just hope to be clear, and not cause offense... 7.692308
#> 4 Oh, no, don't monitor yourself on my account... 25.000000
#> ipron article adverb conj prep auxverb negate quant
#> 1 0 0 0.00000 9.090909 18.18182 9.090909 0.000000 9.090909
#> 2 0 0 10.00000 0.000000 0.00000 10.000000 0.000000 0.000000
#> 3 0 0 15.38462 7.692308 15.38462 7.692308 7.692308 0.000000
#> 4 0 0 12.50000 0.000000 12.50000 12.500000 25.000000 0.000000
#> interrog number interjection meta_characters meta_syllables meta_words
#> 1 0 9.090909 0.000000 42 12 11
#> 2 0 0.000000 0.000000 42 11 10
#> 3 0 0.000000 7.692308 46 14 13
#> 4 0 0.000000 12.500000 35 12 8
#> meta_unique_words meta_clauses meta_sentences meta_words_per_clause
#> 1 11 3 1 3.666667
#> 2 9 3 2 3.333333
#> 3 13 4 1 3.250000
#> 4 8 3 1 2.666667
#> meta_words_per_sentence meta_sixltr meta_characters_per_word
#> 1 11 18.18182 3.818182
#> 2 5 30.00000 4.200000
#> 3 13 15.38462 3.538462
#> 4 8 37.50000 4.375000
#> meta_syllables_per_word meta_type_token_ratio meta_reading_grade meta_numbers
#> 1 1.090909 1.0 1.572727 0
#> 2 1.100000 0.9 -0.660000 0
#> 3 1.076923 1.0 2.187692 0
#> 4 1.500000 1.0 5.230000 0
#> meta_puncts meta_periods meta_commas meta_qmarks meta_exclams meta_quotes
#> 1 45.45455 27.27273 18.18182 0 0 0
#> 2 50.00000 40.00000 10.00000 0 0 0
#> 3 46.15385 23.07692 23.07692 0 0 0
#> 4 62.50000 37.50000 25.00000 0 0 0
#> meta_apostrophes meta_brackets meta_orgmarks
#> 1 0.0 0 0
#> 2 0.0 0 0
#> 3 0.0 0 0
#> 4 12.5 0 0
# add space and weight arguments for standard word-centroid vectors
lma_process(texts, space = lma_lspace(texts), weight = "tfidf")
#> text V1
#> 1 Firstly, I would like to say, and with all due respect... -0.07509574
#> 2 Please, proceed. I hope you feel you can speak freely... -0.08118897
#> 3 Oh, of course, I just hope to be clear, and not cause offense... -0.09709788
#> 4 Oh, no, don't monitor yourself on my account... -0.01913927
#> V2 meta_characters meta_syllables meta_words meta_unique_words
#> 1 0.05737681 42 12 11 11
#> 2 -0.16696471 42 11 10 9
#> 3 0.03907020 46 14 13 13
#> 4 0.01993145 35 12 8 8
#> meta_clauses meta_sentences meta_words_per_clause meta_words_per_sentence
#> 1 3 1 3.666667 11
#> 2 3 2 3.333333 5
#> 3 4 1 3.250000 13
#> 4 3 1 2.666667 8
#> meta_sixltr meta_characters_per_word meta_syllables_per_word
#> 1 0.1818182 3.818182 1.090909
#> 2 0.3000000 4.200000 1.100000
#> 3 0.1538462 3.538462 1.076923
#> 4 0.3750000 4.375000 1.500000
#> meta_type_token_ratio meta_reading_grade meta_numbers meta_puncts
#> 1 1.0 1.572727 0 0.4545455
#> 2 0.9 -0.660000 0 0.5000000
#> 3 1.0 2.187692 0 0.4615385
#> 4 1.0 5.230000 0 0.6250000
#> meta_periods meta_commas meta_qmarks meta_exclams meta_quotes
#> 1 0.2727273 0.1818182 0 0 0
#> 2 0.4000000 0.1000000 0 0 0
#> 3 0.2307692 0.2307692 0 0 0
#> 4 0.3750000 0.2500000 0 0 0
#> meta_apostrophes meta_brackets meta_orgmarks
#> 1 0.000 0 0
#> 2 0.000 0 0
#> 3 0.000 0 0
#> 4 0.125 0 0