Adapted a few functions from Neil Shepperd's fork:
* Nucleus Sampling (`top_p`) when generating text, which results in surprisingly different results. (setting `top_p=0.9` works well). Supercedes `top_k` when used. (51)
* An `encode_dataset()` function to preencode and compress a large dataset before loading it for finetuning. (19, 54)
Improvements to continuing model training:
* `overwrite` argument for `finetune`: with `restore_from="latest"`, this continues model training without creating a duplicate copy of the model, and is therefore good for transfer learning using multiple datasets (20)
* You can continue to `finetune` a model without having the original GPT-2 model present.
Improvements with I/O involving Colaboratory
* Checkpoint folders are now packaged into a `.tar` file when copying to Google Drive, and when copying from Google Drive, the '.tar' file is automatically unpackaged into the correct checkpoint format. (you can pass `copy_folder=True` to the `copy_checkpoint` function to revert to the old behavior). (37: thanks woctezuma !)
* `copy_checkpoint_to_gdrive` and `copy_checkpoint_from_gdrive` now take a `run_name` argument instead of a `checkpoint_folder` argument.
Miscellaneous
* Added CLI arguments for `top_k`, `top_p`, `overwrite`.
* Cleaned up redundant function parameters (39)