word2clusters.Rd
Gives each word a class ID number.
word2clusters(train, output = NULL, classes = 0L, size = 100L, window = 5L, sample = 1e-05, hs = 0L, negative = 5L, threads = 1L, iter = 5L, min_count = 5L, alpha = 0.025, debug = 2L, binary = 0L, cbow = 1L, verbose = FALSE)
train | Use text data from file to train the model. |
---|---|
output | Use file to save the resulting word vectors / word clusters. |
classes | Number of word classes; if |
size | Set size of word vectors; default is |
window | Set max skip length between words; default is |
sample | Set threshold for occurrence of words.
Those that appear with higher frequency in the training data will be randomly
down-sampled; default is |
hs | Use Hierarchical Softmax; default is |
negative | Number of negative examples; default is |
threads | Use \(n\) threads (default |
iter | Run more training iterations (default |
min_count | This will discard words that appear less than \(n\) times;
default is |
alpha | Set the starting learning rate; default is |
debug | Set the debug mode (default = |
binary | Save the resulting vectors in binary moded; default is |
cbow | Use the continuous back of words model; default is |
verbose | Whether to print output from training. |
Invisibly returns the output
.
# NOT RUN { # setup word2vec Julia dependency setup_word2vec() # sample corpus data("macbeth", package = "word2vec.r") # train model model_path <- word2clusters(macbeth, classes = 50L) # }