.......................... * vocabulary_generator now writes a temporary bash script for srilm to call that invokes nltkbasedsegmentertokeniserrunner. * The intermediate directory 'code/' has been removed.
.......................... * vocabulary_cutter now falls back to including all words if n is greater than the original size. * There is now a script called nltkbasedsegmentertokeniserrunner that simply imports and runs the corresponding module.
.......................... * Changed the interface and functionality of vocabulary_generator. It no longer does splitting of large files. Instead it takes a list of file names, and the calling function can decide whether or not to split.
.......................... * Added versioneer to deal with git+pypi package management. * Moved the split_file_into_chunks function that had been in vocabulary_generator into utils. * Made unit tests for utils.py
.......................... * Fixed pathnames in tests to go along with new packaging structure.
.......................... * Fixed packaging error in which the package was named 'code' instead of 'recluse'.