Lots of new features in this release:
- support for network export of cluster-cluster cooccurrence via networkx (see the new `export` subcommand on the CLI, or the `Index.create_cluster_cooccurrence_graph` method.
- support for structured sampling using the model as a guide for sampling documents
- interface for corpus implementations to support exporting tabular renderings of documents
- refactoring of the clustering method so more of the pieces can be used on their own - this will be used for further functionality in the future
- reworking of the web UI render pathway, so the top of the page is calculated first and streamed as soon as it's ready - this enables better on interactivity on very large datasets
- more sophisticated tokenisation of stackexchange data: codeblocks are now pulled out and indexed separately.
- initial examples for two large scale document collections: stackoverflow and Australian Federal Hansard
Experimental initial features:
- initial experimental indexing of word positions to enable phrase and proximity queries - this is likely to change a lot in future releases
- initial support for rendering of concordances (keywords in context)
- initial support for intersecting a set of queries with a given field - this will be used in the future to enable first class support for temporal queries