Word Clusters

Gives each word a class ID number.

word2clusters(train, output = NULL, classes = 0L, size = 100L,
  window = 5L, sample = 1e-05, hs = 0L, negative = 5L,
  threads = 1L, iter = 5L, min_count = 5L, alpha = 0.025,
  debug = 2L, binary = 0L, cbow = 1L, verbose = FALSE)

Arguments

train	Use text data from file to train the model.
output	Use file to save the resulting word vectors / word clusters.
classes	Number of word classes; if `0L`, output word classes rather than word vectors (default `0L`).
size	Set size of word vectors; default is `100L`.
window	Set max skip length between words; default is `5L`.
sample	Set threshold for occurrence of words. Those that appear with higher frequency in the training data will be randomly down-sampled; default is `1e-5`.
hs	Use Hierarchical Softmax; default is `1` (`0L` = not used)
negative	Number of negative examples; default is `0L`, common values are `5 - 10` (`0L` = not used).
threads	Use \(n\) threads (default `12L`).
iter	Run more training iterations (default `5`).
min_count	This will discard words that appear less than \(n\) times; default is `5L`.
alpha	Set the starting learning rate; default is `.025`.
debug	Set the debug mode (default = `2L` = more info during training).
binary	Save the resulting vectors in binary moded; default is `0L` (off).
cbow	Use the continuous back of words model; default is `1L` (skip-gram model).
verbose	Whether to print output from training.

Value

Invisibly returns the output.

Examples

# NOT RUN {
# setup word2vec Julia dependency
setup_word2vec()

# sample corpus
data("macbeth", package = "word2vec.r")

# train model
model_path <- word2clusters(macbeth, classes = 50L)
# }

Arguments

Value

Examples

Contents