This release features major improvements on memory efficiency and speed of the neural network pipeline in stanfordnlp and various bugfixes. These features include:
- The downloadable pretrained neural network models are now substantially smaller in size (due to the use of smaller pretrained vocabularies) with comparable performance. Notably, the default English model is now ~9x smaller in size, German ~11x, French ~6x and Chinese ~4x. As a result, memory efficiency of the neural pipelines for most languages are substantially improved.
- Substantial speedup of the neural lemmatizer via reduced neural sequence-to-sequence operations.
- The neural network pipeline can now take in a Python list of strings representing pre-tokenized text. (https://github.com/stanfordnlp/stanfordnlp/issues/58)
- A requirements checking framework is now added in the neural pipeline, ensuring the proper processors are specified for a given pipeline configuration. The pipeline will now raise an exception when a requirement is not satisfied. (https://github.com/stanfordnlp/stanfordnlp/issues/42)
- Bugfix related to alignment between tokens and words post the multi-word expansion processor. (https://github.com/stanfordnlp/stanfordnlp/issues/71)
- More options are added for customizing the Stanford CoreNLP server at start time, including specifying properties for the default pipeline, and setting all server options such as username/password. For more details on different options, please checkout the [client documentation page](https://stanfordnlp.github.io/stanfordnlp/corenlp_client.html#customizing-properties-for-server-start-and-requests).
- `CoreNLPClient` instance can now be created with CoreNLP default language properties as:
python
client = CoreNLPClient(properties='chinese')
- Alternatively, a properties file can now be used during the creation of a `CoreNLPClient`:
python
client = CoreNLPClient(properties='/path/to/corenlp.props')
- All specified CoreNLP annotators are now preloaded by default when a `CoreNLPClient` instance is created. (https://github.com/stanfordnlp/stanfordnlp/issues/56)