- Add new models: MVLSTM, MatchPyramid, HBMP, DUET, MatchSRNN, aNMM and DIIN
- Add activation to classification models
- Add Ngram callbacks for `Dataset`
- Remove `DSSMPreprocessor`, `CDSSMPreprocessor` and `CDSSMPadding`
- Add BERT `attention_mask` and `token_type_ids`
- Support specifying GPUs for DataParallel and remove parameter `data_parallel` in `trainer`
- Fix other bugs