clustering.Rmd
You must run setup_word2vec
at the begining of every session, you will otherwise encounter errors and be prompted to do so.
You must run setup_word2vec
at the begining of every session, you will otherwise encounter errors and be prompted to do so.
library(word2vec.r)
# setup word2vec Julia dependency
setup_word2vec()
#> Julia version 1.1.1 at location /home/jp/Downloads/julia-1.1.1-linux-x86_64/julia-1.1.1/bin will be used.
#> Loading setup script for JuliaCall...
#> Finish loading setup script for JuliaCall.
The package comes with a dataset, Macbeth by Shakespeare. However, being a corpus of 17,319 words it is not lazyly loaded and needs to be imported manually with the data
function. Note that the dataset is mildly preprocessed, all words are lowercase and punctuation has been removed.
data("macbeth", package = "word2vec.r")
You can also cluster words.
model_path <- word2clusters(macbeth, classes = 50L) # train model
model <- word_clusters(model_path)
We provide both a functional API and a reference class.
vocabulary
in_vocabulary
index
get_cluster
clusters
get_words
get_cluster(model, "king")
#> [1] 5
get_cluster(model, "macbeth")
#> [1] 44
We provide a reference class, because it is tedious to specify the vectors (model
object in this example) as first argument to every functionv.
wc <- WordClusters$new(model)
wc$get_words(4L)
#> [1] "to" "i" "thy" "know" "too" "comes"
#> [7] "stand" "himselfe" "daggers" "power" "fall" "hearke"
#> [13] "chance"
wc$clusters()
#> [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
#> [24] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
#> [47] 46 47 48 49