Nltk-ma

Latest version: v0.0.6

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 8 of 12

0.9.9

NLTK:
* Finalized API for NLTK 2.0 and the book, incl dozens of small fixes
* Names of the form nltk.foo.Bar now available as nltk.Bar
for significant functionality; in some cases the name was modified
(using old names will produce a deprecation warning)
* Bugfixes in downloader, WordNet
* Expanded functionality in DecisionTree
* Bigram collocations extended for discontiguous bigrams
* Translation toy nltk.misc.babelfish
* New module nltk.help giving access to tagset documentation
* Fix imports so that NLTK builds without Tkinter (Bjorn Maeland)

Data:
* new maxent NE chunker model
* updated grammar packages for the book
* data for new tagsets collection, documenting several tagsets
* added lolcat translation to the Genesis collection

Contrib (work in progress):
* Updates to coreference package (Joseph Frazee)
* New ISRI Arabic stemmer (Hosam Algasaier)
* Updates to Toolbox package (Greg Aumann)

Book:
* Substantial editorial corrections ahead of final submission

0.9.8

NLTK:
* New off-the-shelf tokenizer, POS tagger, and named-entity tagger
* New metrics package with inter-annotator agreement scores,
distance metrics, rank correlation
* New collocations package (Joel Nothman)
* Many clean-ups to WordNet package (Steven Bethard, Jordan Boyd-Graber)
* Moved old pywordnet-based WordNet package to nltk_contrib
* WordNet browser (Paul Bone)
* New interface to dependency treebank corpora
* Moved MinimalSet class into nltk.misc package
* Put NLTK applications in new nltk.app package
* Many other improvements incl semantics package, toolbox, MaltParser
* Misc changes to many API names in preparation for 1.0, old names deprecated
* Most classes now available in the top-level namespace
* Work on Python egg distribution (Brandon Rhodes)
* Removed deprecated code remaining from 0.8.* versions
* Fixes for Python 2.4 compatibility

Data:
* Corrected identifiers in Dependency Treebank corpus
* Basque and Catalan Dependency Treebanks (CoNLL 2007)
* PE08 Parser Evalution data
* New models for POS tagger and named-entity tagger

Book:
* Substantial editorial corrections

0.9.7

NLTK:
* fixed problems with accessing zipped corpora
* improved design and efficiency of grammars and chart parsers
including new bottom-up combine strategy and a redesigned
Earley strategy (Peter Ljunglof)
* fixed bugs in smoothed probability distributions and added
regression tests (Peter Ljunglof)
* improvements to Punkt (Joel Nothman)
* improvements to text classifiers
* simple word-overlap RTE classifier

Data:
* A new package of large grammars (Peter Ljunglof)
* A small gazetteer corpus and corpus reader
* Organized example grammars into separate packages
* Childrens' stories added to gutenberg package

Contrib (work in progress):
* fixes and demonstration for named-entity feature extractors in nltk_contrib.coref

Book:
* extensive changes throughout, including new chapter 5 on classification
and substantially revised chapter 11 on managing linguistic data

0.9.6

NLTK:
* new WordNet corpus reader (contributed by Steven Bethard)
* incorporated dependency parsers into NLTK (was NLTK-Contrib) (contributed by Jason Narad)
* moved nltk/cfg.py to nltk/grammar.py and incorporated dependency grammars
* improved efficiency of unification algorithm
* various enhancements to the semantics package
* added plot() and tabulate() methods to FreqDist and ConditionalFreqDist
* FreqDist.keys() and list(FreqDist) provide keys reverse-sorted by value,
to avoid the confusion caused by FreqDist.sorted()
* new downloader module to support interactive data download: nltk.download()
run using "python -m nltk.downloader all"
* fixed WordNet bug that caused min_depth() to sometimes give incorrect result
* added nltk.util.Index as a wrapper around defaultdict(list) plus
a functional-style initializer
* fixed bug in Earley chart parser that caused it to break
* added basic TnT tagger nltk.tag.tnt
* new corpus reader for CoNLL dependency format (contributed by Kepa Sarasola and Iker Manterola)
* misc other bugfixes

Contrib (work in progress):
* TIGERSearch implementation by Torsten Marek
* extensions to hole and glue semantics modules by Dan Garrette
* new coreference package by Joseph Frazee
* MapReduce interface by Xinfan Meng

Data:
* Corpora are stored in compressed format if this will not compromise speed of access
* Swadesh Corpus of comparative wordlists in 23 languages
* Split grammar collection into separate packages
* New Basque and Spanish grammar samples (contributed by Kepa Sarasola and Iker Manterola)
* Brown Corpus sections now have meaningful names (e.g. 'a' is now 'news')
* Fixed bug that forced users to manually unzip the WordNet corpus
* New dependency-parsed version of Treebank corpus sample
* Added movie script "Monty Python and the Holy Grail" to webtext corpus
* Replaced words corpus data with a much larger list of English words
* New URL for list of available NLTK corpora
http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml

Book:
* complete rewrite of first three chapters to make the book accessible
to a wider audience
* new chapter on data-intensive language processing
* extensive reworking of most chapters
* Dropped subsection numbering; moved exercises to end of chapters

Distributions:
* created Portfile to support Mac installation

0.9.5

NLTK:
* text module with support for concordancing, text generation, plotting
* book module
* Major reworking of the automated theorem proving modules (Dan Garrette)
* draw.dispersion now uses pylab
* draw.concordance GUI tool
* nltk.data supports for reading corpora and other data files from within zipfiles
* trees can be constructed from strings with Tree(s) (cf Tree.parse(s))

Contrib (work in progress):
* many updates to student projects
- nltk_contrib.agreement (Thomas Lippincott)
- nltk_contrib.coref (Joseph Frazee)
- nltk_contrib.depparser (Jason Narad)
- nltk_contrib.fuf (Petro Verkhogliad)
- nltk_contrib.hadoop (Xinfan Meng)
* clean-ups: deleted stale files; moved some packages to misc

Data
* Cleaned up Gutenberg text corpora
* added Moby Dick; removed redundant copy of Blake songs.
* more tagger models
* renamed to nltk_data to facilitate installation
* stored each corpus as a zip file for quicker installation
and access, and to solve a problem with the Propbank
corpus including a file with an illegal name for MSWindows
(con.xml).

Book:
* changed filenames to chNN format
* reworked opening chapters (work in progress)

Distributions:
* fixed problem with mac installer that arose when Python binary
couldn't be found
* removed dependency of NLTK on nltk_data so that NLTK code can be
installed before the data

0.9.4

NLTK:
- Expanded semantics package for first order logic, linear logic,
glue semantics, DRT, LFG (Dan Garrette)
- new WordSense class in wordnet.synset supporting access to synsets
from sense keys and accessing sense counts (Joel Nothman)
- interface to Mallet's linear chain CRF implementation (nltk.tag.crf)
- misc bugfixes incl Punkt, synsets, maxent
- improved support for chunkers incl flexible chunk corpus reader,
new rule type: ChunkRuleWithContext
- new GUI for pos-tagged concordancing nltk.draw.pos_concordance
- new GUI for developing regexp chunkers nltk.draw.rechunkparser
- added bio_sents() and bio_words() methods to ConllChunkCorpusReader in conll.py
to allow reading (word, tag, chunk_typ) tuples off of CoNLL-2000 corpus. Also
modified ConllChunkCorpusView to support these changes.
- feature structures support values with custom unification methods
- new flag on tagged corpus readers to use simplified tagsets
- new package for ngram language modeling with Katz backoff nltk.model
- added classes for single-parented and multi-parented trees that
automatically maintain parent pointers (nltk.tree.ParentedTree and
nltk.tree.MultiParentedTree)
- new WordNet browser GUI (Jussi Salmela, Paul Bone)
- improved support for lazy sequences
- added generate() method to probability distributions
- more flexible parser for converting bracketed strings to trees
- made fixes to docstrings to improve API documentation

Contrib (work in progress)
- new NLG package, FUF/SURGE (Petro Verkhogliad)
- new dependency parser package (Jason Narad)
- new Coreference package, incl support for
ACE-2, MUC-6 and MUC-7 corpora (Joseph Frazee)
- CCG Parser (Graeme Gange)
- first order resolution theorem prover (Dan Garrette)

Data:
- Nnw NPS Chat Corpus and corpus reader (nltk.corpus.nps_chat)
- ConllCorpusReader can now be used to read CoNLL 2004 and 2005 corpora.
- Implemented HMM-based Treebank POS tagger and phrase chunker for
nltk_contrib.coref in api.py. Pickled versions of these objects are checked
in in data/taggers and data/chunkers.

Book:
- misc corrections in response to feedback from readers

Page 8 of 12

Releases

Has known vulnerabilities

Previous Next

Nltk-ma

Page 8 of 12

0.9.9

0.9.8

0.9.7

0.9.6

0.9.5

0.9.4

Page 8 of 12

Links

Releases