✨ Major features and improvements
* Add new `foreach` layer combinator, which maps a layer across elements of a sequence.
* Add support for `predict` methods to more layers, for use during decoding.
* Improve correctness of batch normalization. Previously, some layers would force batch normalization to run in training mode, even during prediction. This led to decreased accuracy in some situations.
* Improved efficiency of `Maxout` layer.
🔴 Bug fixes
* Fix compiler flags for MSVC
* Remove unnecessary Chainer dependency. Now depends on Chainer's `cupy` package.
* Fix LSTM layer.
* Small bug fixes to beam search