Enter a numerical matrix, set of vectors, or set of matrices to calculate similarity per vector.

## Usage

```
lma_simets(a, b = NULL, metric = NULL, group = NULL, lag = 0,
agg = TRUE, agg.mean = TRUE, pairwise = TRUE, symmetrical = FALSE,
mean = FALSE, return.list = FALSE)
```

## Arguments

- a
A vector or matrix. If a vector,

`b`

must also be provided. If a matrix and`b`

is missing, each row will be compared. If a matrix and`b`

is not missing, each row will be compared with`b`

or each row of`b`

.- b
A vector or matrix to be compared with

`a`

or rows of`a`

.- metric
A character or vector of characters at least partially matching one of the available metric names (or 'all' to explicitly include all metrics), or a number or vector of numbers indicating the metric by index:

:`jaccard`

`sum(a & b) / sum(a | b)`

:`euclidean`

`1 / (1 + sqrt(sum((a - b) ^ 2)))`

:`canberra`

`mean(1 - abs(a - b) / (a + b))`

:`cosine`

`sum(a * b) / sqrt(sum(a ^ 2 * sum(b ^ 2)))`

:`pearson`

`(mean(a * b) - (mean(a) * mean(b))) /`

`sqrt(mean(a ^ 2) - mean(a) ^ 2) / sqrt(mean(b ^ 2) - mean(b) ^ 2)`

- group
If

`b`

is missing and`a`

has multiple rows, this will be used to make comparisons between rows of`a`

, as modified by`agg`

and`agg.mean`

.- lag
Amount to adjust the

`b`

index; either rows if`b`

has multiple rows (e.g., for`lag = 1`

,`a[1, ]`

is compared with`b[2, ]`

), or values otherwise (e.g., for`lag = 1`

,`a[1]`

is compared with`b[2]`

). If`b`

is not supplied,`b`

is a copy of`a`

, resulting in lagged self-comparisons or autocorrelations.- agg
Logical: if

`FALSE`

, only the boundary rows between groups will be compared, see example.- agg.mean
Logical: if

`FALSE`

aggregated rows are summed instead of averaged.- pairwise
Logical: if

`FALSE`

and`a`

and`b`

are matrices with the same number of rows, only paired rows are compared. Otherwise (and if only`a`

is supplied), all pairwise comparisons are made.- symmetrical
Logical: if

`TRUE`

and pairwise comparisons between`a`

rows were made, the results in the lower triangle are copied to the upper triangle.- mean
Logical: if

`TRUE`

, a single mean for each metric is returned per row of`a`

.- return.list
Logical: if

`TRUE`

, a list-like object will always be returned, with an entry for each metric, even when only one metric is requested.

## Value

Output varies based on the dimensions of `a`

and `b`

:

**Out:**A vector with a value per metric.**In:**Only when`a`

and`b`

are both vectors.**Out:**A vector with a value per row.**In:**Any time a single value is expected per row:`a`

or`b`

is a vector,`a`

and`b`

are matrices with the same number of rows and`pairwise = FALSE`

, a group is specified, or`mean = TRUE`

, and only one metric is requested.**Out:**A data.frame with a column per metric.**In:**When multiple metrics are requested in the previous case.**Out:**A sparse matrix with a`metric`

attribute with the metric name.**In:**Pairwise comparisons within an`a`

matrix or between an`a`

and`b`

matrix, when only 1 metric is requested.**Out:**A list with a sparse matrix per metric.**In:**When multiple metrics are requested in the previous case.

## Details

Use `setThreadOptions`

to change parallelization options; e.g., run
RcppParallel::setThreadOptions(4) before a call to lma_simets to set the number of CPU
threads to 4.

## Examples

```
text <- c(
"words of speaker A", "more words from speaker A",
"words from speaker B", "more words from speaker B"
)
(dtm <- lma_dtm(text))
#> 4 x 7 sparse Matrix of class "dgCMatrix"
#> a b from more of speaker words
#> [1,] 1 . . . 1 1 1
#> [2,] 1 . 1 1 . 1 1
#> [3,] . 1 1 . . 1 1
#> [4,] . 1 1 1 . 1 1
# compare each entry
lma_simets(dtm)
#> $jaccard
#> 4 x 4 sparse Matrix of class "dtCMatrix" (unitriangular)
#>
#> [1,] I . . .
#> [2,] 0.5000000 I . .
#> [3,] 0.3333333 0.5000000 I .
#> [4,] 0.2857143 0.6666667 0.8 I
#>
#> $euclidean
#> 4 x 4 sparse Matrix of class "dtCMatrix" (unitriangular)
#>
#> [1,] I . . .
#> [2,] 0.3660254 I . .
#> [3,] 0.3333333 0.3660254 I .
#> [4,] 0.3090170 0.4142136 0.5 I
#>
#> $canberra
#> 4 x 4 sparse Matrix of class "dtCMatrix" (unitriangular)
#>
#> [1,] I . . .
#> [2,] 0.5714286 I . .
#> [3,] 0.4285714 0.5714286 I .
#> [4,] 0.2857143 0.7142857 0.8571429 I
#>
#> $cosine
#> 4 x 4 sparse Matrix of class "dtCMatrix" (unitriangular)
#>
#> [1,] I . . .
#> [2,] 0.6708204 I . .
#> [3,] 0.5000000 0.6708204 I .
#> [4,] 0.4472136 0.8000000 0.8944272 I
#>
#> $pearson
#> 4 x 4 sparse Matrix of class "dtCMatrix" (unitriangular)
#>
#> [1,] I . . .
#> [2,] 0.09128709 I . .
#> [3,] -0.16666667 0.09128709 I .
#> [4,] -0.54772256 0.30000000 0.7302967 I
#>
#> attr(,"time")
#> simets
#> 0
# compare each entry with the mean of all entries
lma_simets(dtm, colMeans(dtm))
#> jaccard euclidean canberra cosine pearson
#> 1 0.5714286 0.4220645 0.4380952 0.7484552 0.1964186
#> 2 0.7142857 0.5166852 0.5986395 0.9128709 0.6454972
#> 3 0.5714286 0.5166852 0.5034014 0.8845380 0.7463905
#> 4 0.7142857 0.5166852 0.5986395 0.9128709 0.6454972
# compare by group (corresponding to speakers and turns in this case)
speaker <- c("A", "A", "B", "B")
## by default, consecutive rows from the same group are averaged:
lma_simets(dtm, group = speaker)
#> jaccard euclidean canberra cosine pearson
#> 1, 2 <-> 3, 4 0.5714286 0.3874259 0.5238095 0.6888467 -0.1324532
## with agg = FALSE, only the rows at the boundary between
## groups (rows 2 and 3 in this case) are used:
lma_simets(dtm, group = speaker, agg = FALSE)
#> jaccard euclidean canberra cosine pearson
#> 2 <-> 3 0.5 0.3660254 0.5714286 0.6708204 0.09128709
```