Nltk-ma

Latest version: v0.0.6

Safety actively analyzes 714860 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 10 of 12

0.8b1

Code (major):
- changed package name to nltk
- import all top-level modules into nltk, reducing need for import statements
- reorganization of sub-package structures to simplify imports
- new featstruct module, unifying old featurelite and featurestructure modules
- FreqDist now inherits from dict, fd.count(sample) becomes fd[sample]
- FreqDist initializer permits: fd = FreqDist(len(token) for token in text)
- made numpy optional
Code (minor):
- changed GrammarFile initializer to accept filename
- consistent tree display format
- fixed loading process for WordNet and TIMIT that prevented code installation if data not installed
- taken more care with unicode types
- incorporated pcfg code into cfg module
- moved cfg, tree, featstruct to top level
- new filebroker module to make handling of example grammar files more transparent
- more corpus readers (webtext, abc)
- added cfg.covers() to check that a grammar covers a sentence
- simple text-based wordnet browser
- known bug: parse/featurechart.py uses incorrect apply() function
Corpora:
- csv data file to document NLTK corpora
Contrib:
- added Glue semantics code (contrib.glue, by Dan Garrette)
- Punkt sentence segmenter port (contrib.punkt, by Willy)
- added LPath interpreter (contrib.lpath, by Haejoong Lee)
- extensive work on classifiers (contrib.classifier*, Sumukh Ghodke)
Tutorials:
- polishing on parts I, II
- more illustrations, data plots, summaries, exercises
- continuing to make prose more accessible to non-linguistic audience
- new default import that all chapters presume: from nltk.book import *
Distributions:
- updated to latest version of numpy
- removed WordNet installation instructions as WordNet is now included in corpus distribution
- added pylab (matplotlib)

0.7.5

Code:
- improved WordNet and WordNet-Similarity interface
- the Lancaster Stemmer (contributed by Steven Tomcavage)
Corpora:
- Web text samples
- BioCreAtIvE-PPI - a corpus for protein-protein interactions
- Switchboard Telephone Speech Corpus Sample (via Talkbank)
- CMU Problem Reports Corpus sample
- CONLL2002 POS+NER data
- Patient Information Leaflet corpus
- WordNet 3.0 data files
- English wordlists: basic English, frequent words
Tutorials:
- more improvements to text and images

0.7.4

Code:
- Indian POS tagged corpus reader: corpora.indian
- Sinica Treebank corpus reader: corpora.sinica_treebank
- new web corpus reader corpora.web
- tag package now supports pickling
- added function to utilities.py to guess character encoding
Corpora:
- Rotokas texts from Stuart Robinson
- POS-tagged corpora for several Indian languages (Bangla, Hindi, Marathi, Telugu) from A Kumaran
Tutorials:
- Substantial work on Part II of book on structured programming, parsing and grammar
- More bibliographic citations
- Improvements in typesetting, cross references
- Redimensioned images and tables for better use of page space
- Moved project list to wiki
Contrib:
- validation of toolbox entries using chunking
- improved classifiers
Distribution:
- updated for Python 2.5.1, Numpy 1.0.2

0.7.3

* Code:
- made chunk.Regexp.parse() more flexible about its input
- developed new syntax for PCFG grammars, e.g. A -> B C [0.3] | D [0.7]
- fixed CFG parser to support grammars with slash categories
- moved beta classify package from main NLTK to contrib
- Brill taggers loaded correctly
- misc bugfixes
* Corpora:
- Shakespeare XML corpus sample and corpus reader
* Tutorials:
- improvements to prose, exercises, plots, images
- expanded and reorganized tutorial on structured programming
- formatting improvements for Python listings
- improved plots (using pylab)
- categorization of problems by difficulty
Contrib:
- more work on kimmo lexicon and grammar
- more work on classifiers

0.7.2

* Code:
- simple feature detectors (detect module)
- fixed problem when token generators are passed to a parser (parse package)
- fixed bug in Grammar.productions() (identified by Lucas Champollion and Mitch Marcus)
- fixed import bug in category.GrammarFile.earley_parser
- added utilities.OrderedDict
- initial port of old NLTK classifier package (by Sam Huston)
- UDHR corpus reader
* Corpora:
- added UDHR corpus (Universal Declaration of Human Rights)
with 10k text samples in 300+ languages
* Tutorials:
- improved images
- improved book formatting, including new support for:
- javascript to copy program examples to clipboard in HTML version,
- bibliography, chapter cross-references, colorization, index, table-of-contents

* Contrib:
- new Kimmo system: contrib.mit.six863.kimmo (Rob Speer)
- fixes for: contrib.fsa (Rob Speer)
- demonstration of text classifiers trained on UDHR corpus for
language identification: contrib.langid (Sam Huston)
- new Lambek calculus system: contrib.lambek
- new tree implementation based on elementtree: contrib.tree

0.7.1

* Code:
- bugfixes (HMM, WordNet)

Page 10 of 12

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.