✨ Major features and improvements
* Add `Model.to_bytes()` and `Model.from_bytes()` methods, to support serialization that's compatible between Python versions.
* Remove code depending on [Chainer](http://chainer.org), and instead depend explicitly on the new `cupy` subpackage, for simpler GPU installation.
* Improve accuracy for HashEmbed table, by using 4 conditionally independent keys.
* Support padding in `flatten` and `with_flatten ops`.
* Use the same hash function on both CPU and GPU, for model compatibility.
🔴 Bug fixes
* `HashEmbed` now returns correct results for arrays of length not divisible by 16.
* Provide `.cu` source files in the source distribution.
* Remove unnecessary allocations from the CPU maxout op.
* Fix issue 27: Remove Python2-specific code from `setup.py`.