R/topic_modelling.R
Used to load the files into memory. Assume the format of the new crawler, where each year of
mailing list is inside a folder, and months inside sub-folders.
See rawToLDA
to see it's usage.
loadFiles(parsed.corpus.folder.path, corpus_setup = "/**/*.reply.title_body.txt")
raw.corpus.folder.path | The path to the corpus folder (e.g. 2012.parsed) Returns a folder used by |
---|
TODO: Parameterize the file extension (currently assumes reply.body.txt)