Load Corpus Files

Used to load the files into memory. Assume the format of the new crawler, where each year of mailing list is inside a folder, and months inside sub-folders. See rawToLDA to see it's usage.

loadFiles(parsed.corpus.folder.path,
  corpus_setup = "/**/*.reply.title_body.txt")

Arguments

raw.corpus.folder.path

raw.corpus.folder.path	The path to the corpus folder (e.g. 2012.parsed) Returns a folder used by `rawToLDA`.

The path to the corpus folder (e.g. 2012.parsed)

Returns a folder used by rawToLDA.

Details

TODO: Parameterize the file extension (currently assumes reply.body.txt)

Arguments

Details

Contents