
Latest version: v0.0.6

Safety actively analyzes 714860 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 12


Robin Cooper, Pablo Duboue, Christian Federmann, Dan Garrette, Ewan Klein,
Pierre-François Laquerre, Max Leonov, Peter Ljunglöf, Nitin Madnani, Ceri Stagg


Daniel Blanchard, Mikhail Korobov, Nitin Madnani, Duncan McGreggor,
Morten Neergaard, Nathan Schneider, Rico Sennrich.


* added interface to the Stanford POS Tagger
* updates to sem.Boxer, sem.drt.DRS
* allow unicode strings in grammars
* allow non-string features in classifiers
* modifications to HunposTagger
* issues with DRS printing
* fixed bigram collocation finder for window_size > 2
* doctest paths no longer presume unix-style pathname separators
* fixed issue with NLTK's tokenize module colliding with the Python tokenize module
* fixed issue with stemming Unicode strings
* changed ViterbiParser.nbest_parse to parse
* ChaSen and KNBC Japanese corpus readers
* preserve case in concordance display
* fixed bug in simplification of Brown tags
* a version of IBM Model 1 as described in Koehn 2010
* new class AlignedSent for aligned sentence data and evaluation metrics
* new nltk.util.set_proxy to allow easy configuration of HTTP proxy
* improvements to downloader user interface to catch URL and HTTP errors
* added CHILDES corpus reader
* created special exception hierarchy for Prover9 errors
* significant changes to the underlying code of the boxer interface
* path-based wordnet similarity metrics use a fake root node for verbs, following the Perl version
* added ability to handle multi-sentence discourses in Boxer
* added the 'english' Snowball stemmer
* simplifications and corrections of Earley Chart Parser rules
* several changes to the feature chart parsers for correct unification
* bugfixes: FreqDist.plot, FreqDist.max, NgramModel.entropy, CategorizedCorpusReader, DecisionTreeClassifier
* removal of Python >2.4 language features for 2.4 compatibility
* removal of deprecated functions and associated warnings
* added semantic domains to wordnet corpus reader
* changed wordnet similarity functions to include instance hyponyms
* updated to use latest version of Boxer

* JEITA Public Morphologically Tagged Corpus (in ChaSen format)
* KNB Annotated corpus of Japanese blog posts
* Fixed some minor bugs in alvey.fcfg, and added number of parse trees in alvey_sentences.txt
* added more comtrans data

* minor fixes to documentation
* NLTK Japanese book (chapter 12) by Masato Hagiwara

* Viethen and Dale referring expression algorithms


* many code and documentation cleanups
* Added port of Snowball stemmers
* Fixed loading of pickled tokenizers (issue 556)
* DecisionTreeClassifier now handles unknown features (issue 570)
* Added error messages to LogicParser
* Replaced max_models with end_size to prevent Mace from hanging
* Added interface to Boxer
* Added nltk.corpus.semcor to give access to SemCor 3.0 corpus (issue 530)
* Added support for integer- and float-valued features in maxent classifiers
* Permit NgramModels to be pickled
* Added Sourced Strings (see test/sourcedstring.doctest for details)
* Fixed bugs in with Good-Turing and Simple Good-Turing Estimation (issue 26)
* Add support for span tokenization, aka standoff annotation of segmentation (incl Punkt)
* allow unicode nodes in
* Fixed WordNet's morphy to be consistent with the original implementation,
taking the shortest returned form instead of an arbitrary one (issues 427, 487)
* Fixed bug in MaxentClassifier
* Accepted bugfixes for YCOE corpus reader (issue 435)
* Added test to _cumulative_frequencies() to correctly handle the case when no arguments are supplied
* Added a TaggerI interface to the HunPos open-source tagger
* Return 0, not None, when no count is present for a lemma in WordNet
* fixed pretty-printing of unicode leaves
* More efficient calculation of the leftcorner relation for left corner parsers
* Added two functions for graph calculations: transitive closure and inversion.
* FreqDist.pop() and FreqDist.popitems() now invalidate the caches (issue 511)

* Added SemCor 3.0 corpus (Brown Corpus tagged with WordNet synsets)
* Added LanguageID corpus (trigram counts for 451 languages)
* Added grammar for a^n b^n c^n

* minor updates

Thanks to the following contributors to 2.0b9:

Steven Bethard, Francis Bond, Dmitry Chichkov, Liang Dong, Dan Garrette,
Simon Greenhill, Bjorn Maeland, Rob Malouf, Joel Nothman, Jacob Perkins,
Alberto Planas, Alex Rudnick, Geoffrey Sampson, Kevin Scannell, Richard Sproat


* fixed copyright and license statements
* removed PyYAML, and added dependency to installers and download instructions
* updated to LogicParser, DRT (Dan Garrette)
* WordNet similarity metrics return None instead of -1 when
they fail to find a path (Steve Bethard)
* shortest_path_distance uses instance hypernyms (Jordan Boyd-Graber)
* clean_html improved (Bjorn Maeland)
* batch_parse, batch_interpret and batch_evaluate functions allow
grammar or grammar filename as argument
* more Portuguese examples (portuguese_en.doctest, examples/

* Aligner implementations (Christopher Crowner, Torsten Marek)
* ScriptTranscriber package (Richard Sproat and Kristy Hollingshead)

* updates for second printing, correcting errata

* added Europarl sample, with 10 docs for each of 11 langs (Nitin Madnani)
* added SMULTRON sample corpus (Torsten Marek, Martin Volk)


* minor bugfixes and enhancements: data loader, inference package, FreqDist, Punkt
* added Portuguese example module, similar to for English (examples/
* added all_lemma_names() method to WordNet corpus reader
* added update() and __add__() extensions to FreqDist (enhances alignment with Python 3.0 counters)
* reimplemented clean_html
* added test-suite runner for automatic/manual regression testing

* updated Punkt models for sentence segmentation
* added corpus of the works of Machado de Assis (Brazilian Portuguese)

* Added translation of preface into Portuguese, contributed by Tiago Tresoldi.

Page 6 of 12

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.