You must run setup_word2vec at the begining of every session, you will otherwise encounter errors and be prompted to do so.

You must run setup_word2vec at the begining of every session, you will otherwise encounter errors and be prompted to do so.

library(word2vec.r)

# setup word2vec Julia dependency
setup_word2vec()
#> Julia version 1.1.1 at location /home/jp/Downloads/julia-1.1.1-linux-x86_64/julia-1.1.1/bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.

The package comes with a dataset, Macbeth by Shakespeare. However, being a corpus of 17,319 words it is not lazyly loaded and needs to be imported manually with the data function. Note that the dataset is mildly preprocessed, all words are lowercase and punctuation has been removed.

data("macbeth", package = "word2vec.r")

Functions

You can also cluster words.

model_path <- word2clusters(macbeth, classes = 50L) # train model
model <- word_clusters(model_path)

We provide both a functional API and a reference class.

  • vocabulary
  • in_vocabulary
  • index
  • get_cluster
  • clusters
  • get_words

Functional

get_cluster(model, "king")
#> [1] 5
get_cluster(model, "macbeth")
#> [1] 44

Reference Class

We provide a reference class, because it is tedious to specify the vectors (model object in this example) as first argument to every functionv.

wc <- WordClusters$new(model)
wc$get_words(4L)
#>  [1] "to"       "i"        "thy"      "know"     "too"      "comes"   
#>  [7] "stand"    "himselfe" "daggers"  "power"    "fall"     "hearke"  
#> [13] "chance"
wc$clusters()
#>  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
#> [24] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
#> [47] 46 47 48 49