You must run setup_word2vec at the begining of every session, you will otherwise encounter errors and be prompted to do so.

library(word2vec.r)

# setup word2vec Julia dependency
setup_word2vec()
#> Julia version 1.1.1 at location /home/jp/Downloads/julia-1.1.1-linux-x86_64/julia-1.1.1/bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.

The package comes with a dataset, Macbeth by Shakespeare. However, being a corpus of 17,319 words it is not lazyly loaded and needs to be imported manually with the data function. Note that the dataset is mildly preprocessed, all words are lowercase and punctuation has been removed.

data("macbeth", package = "word2vec.r")

Functions

With data we can train a model and extract the vectors.

model_path <- word2vec(macbeth) # train model
model <- word_vectors(model_path) # get word vectors

There are then a multitude of functions one can use on the model.

  • get_vector
  • vocabulary
  • in_vocabulary
  • size
  • index
  • cosine
  • cosine_similar_words
  • similarity
  • analogy
  • analogy_words

All are well documented and have examples, visit their respective man pages with i.e.: ?get_vector. Note that since all the functions listed above require the output of word_vectors (the model object in our case). Therefore a convenient reference class also exists.

Functional

# words similar to king
cosine_similar_words(model, "king", 5L)
#> [1] "king"  "yet"   "rosse" "and"   "from"

# size of model
size(model)
#> # A tibble: 1 x 2
#>   length words
#>    <int> <int>
#> 1    100   511

# get vocabulary
vocab <- vocabulary(model)
head(vocab)
#> [1] "</s>" "the"  "and"  "to"   "i"    "of"

# index of word macbeth
idx <- index(model, "macbeth")
vocab[idx]
#> [1] "macbeth"

Reference Class

Because everything depends on the vectors (model object in our case) we provide reference class which limits the repetitive specification of said model as first argument to all functions.

wv <- WordVectors$new(model)
wv$get_vector("macbeth")
#>   [1]  0.060261002 -0.099035711  0.089729781 -0.131802295  0.054377451
#>   [6] -0.037143736  0.034620966  0.013240862 -0.046540684  0.216212103
#>  [11]  0.062457392  0.184753333 -0.145580993 -0.073077572  0.003581891
#>  [16]  0.196514459  0.120471438  0.053800082  0.140348820  0.136948506
#>  [21] -0.031166868  0.050116140  0.124824226 -0.164362478 -0.005901479
#>  [26] -0.092760047  0.007204235  0.018347540  0.167392284  0.069425808
#>  [31] -0.069325596 -0.015448745 -0.162950776  0.053471405 -0.045633260
#>  [36] -0.001947240  0.099237974 -0.089840106  0.039227503 -0.065056930
#>  [41] -0.008485846  0.145537322 -0.139205575 -0.139850518  0.056980666
#>  [46]  0.111228944 -0.021939085 -0.019801994 -0.056312279 -0.061632252
#>  [51] -0.020469921 -0.082446020  0.063011317 -0.054398137 -0.014775302
#>  [56] -0.034342855  0.161132708 -0.078148394 -0.020963626 -0.218176811
#>  [61]  0.083799342 -0.134379308 -0.029002656 -0.061294381 -0.073210881
#>  [66]  0.175620705  0.110194186 -0.070679837  0.158414571  0.084456696
#>  [71]  0.005258834 -0.020719532  0.093608631 -0.102572553 -0.169927925
#>  [76] -0.009947655 -0.173633013  0.014916886 -0.194387940 -0.147128763
#>  [81]  0.028671680  0.025153226 -0.046986122 -0.079825796  0.074098080
#>  [86]  0.187334483  0.049447294 -0.055605278  0.227896461 -0.056259874
#>  [91] -0.039628351  0.077658366  0.016895844  0.038136662  0.047202176
#>  [96]  0.088280384  0.065880691  0.166309718  0.009127571 -0.059123733
wv$cosine("rosse")
#> # A tibble: 10 x 2
#>    index cosine
#>    <int>  <dbl>
#>  1    67  1    
#>  2   106  1.000
#>  3    91  1.000
#>  4    51  1.000
#>  5    54  1.000
#>  6    56  1.000
#>  7   115  1.000
#>  8    13  1.000
#>  9     3  1.000
#> 10    10  1.000