R/topic_modelling.R
This will calculate the perplexity of the model against itself (TODO: Add a holdout option) for every model in the list, and plot as a line plot. The perplexity serves to give a single digit value per model (each with a different k, or alpha) representing how well the generative model can generate the documents. Lower value is better. In topicmodels journal, this is used to select the k-value, when it reaches the lowest. A continuously decreasing curve may suggest the existence of too many topics in the data, perhaps requiring it to be sliced in a smaller subset before creating the LDA model (I guess).
PlotLDAModelsPerplexity(lda.model.list)
lda.model.list | A list of topic models. See |
---|
A perplexity plot.