Nltk

Latest version: v3.9.1

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 12 of 14

0.7.4

Code:
- Indian POS tagged corpus reader: corpora.indian
- Sinica Treebank corpus reader: corpora.sinica_treebank
- new web corpus reader corpora.web
- tag package now supports pickling
- added function to utilities.py to guess character encoding
Corpora:
- Rotokas texts from Stuart Robinson
- POS-tagged corpora for several Indian languages (Bangla, Hindi, Marathi, Telugu) from A Kumaran
Tutorials:
- Substantial work on Part II of book on structured programming, parsing and grammar
- More bibliographic citations
- Improvements in typesetting, cross references
- Redimensioned images and tables for better use of page space
- Moved project list to wiki
Contrib:
- validation of toolbox entries using chunking
- improved classifiers
Distribution:
- updated for Python 2.5.1, Numpy 1.0.2

0.7.3

* Code:
- made chunk.Regexp.parse() more flexible about its input
- developed new syntax for PCFG grammars, e.g. A -> B C [0.3] | D [0.7]
- fixed CFG parser to support grammars with slash categories
- moved beta classify package from main NLTK to contrib
- Brill taggers loaded correctly
- misc bugfixes
* Corpora:
- Shakespeare XML corpus sample and corpus reader
* Tutorials:
- improvements to prose, exercises, plots, images
- expanded and reorganized tutorial on structured programming
- formatting improvements for Python listings
- improved plots (using pylab)
- categorization of problems by difficulty
Contrib:
- more work on kimmo lexicon and grammar
- more work on classifiers

0.7.2

* Code:
- simple feature detectors (detect module)
- fixed problem when token generators are passed to a parser (parse package)
- fixed bug in Grammar.productions() (identified by Lucas Champollion and Mitch Marcus)
- fixed import bug in category.GrammarFile.earley_parser
- added utilities.OrderedDict
- initial port of old NLTK classifier package (by Sam Huston)
- UDHR corpus reader
* Corpora:
- added UDHR corpus (Universal Declaration of Human Rights)
with 10k text samples in 300+ languages
* Tutorials:
- improved images
- improved book formatting, including new support for:
- javascript to copy program examples to clipboard in HTML version,
- bibliography, chapter cross-references, colorization, index, table-of-contents

* Contrib:
- new Kimmo system: contrib.mit.six863.kimmo (Rob Speer)
- fixes for: contrib.fsa (Rob Speer)
- demonstration of text classifiers trained on UDHR corpus for
language identification: contrib.langid (Sam Huston)
- new Lambek calculus system: contrib.lambek
- new tree implementation based on elementtree: contrib.tree

0.7.1

* Code:
- bugfixes (HMM, WordNet)

0.7

* Code:
- bugfixes, including fixed bug in Brown corpus reader
- cleaned up wordnet 2.1 interface code and similarity measures
- support for full Penn treebank format contributed by Yoav Goldberg
* Tutorials:
- expanded tutorials on advanced parsing and structured programming
- checked all doctest code
- improved images for chart parsing

0.7b1

* Code:
- expanded semantic interpretation package
- new high-level chunking interface, with cascaded chunking
- split chunking code into new chunk package
- updated wordnet package to support version 2.1 of Wordnet.
- prototyped basic wordnet similarity measures
(path distance, Wu + Palmer and Leacock + Chodorow, Resnik similarity measures.)
- bugfixes (tag.Window, tag.ngram)
- more doctests
* Contrib:
- toolbox language settings module
* Tutorials:
- rewrite of chunking chapter, switched from Treebank to CoNLL format as main focus,
simplified evaluation framework, added ngram chunking section
- substantial updates throughout (esp programming and semantics chapters)
* Corpora:
- Chat-80 Prolog data files provided as corpora, plus corpus reader

Page 12 of 14

Releases

Has known vulnerabilities

Previous Next

Nltk

Page 12 of 14

0.7.4

0.7.3

0.7.2

0.7.1

0.7

0.7b1

Page 12 of 14

Links

Releases