Convokit

Latest version: v3.0.0

Safety actively analyzes 637141 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

2.3.2

This release describes changes that have happened since the v2.3 release, and includes changes from both v2.3.1 and v2.3.2.

Functionality

Naming changes

- `Utterance.root` has been renamed to `Utterance.conversation_id`
- `User` has been renamed to `Speaker`. Functions with 'user' in the name have been renamed accordingly
- `User.name` has been renamed to `Speaker.id`

(Backwards compatibility will be maintained for all the deprecated attributes and functions.)

Corpus

- Corpus now allows users to generate `pandas` DataFrames for its internal components using `get_conversations_dataframe()`, `get_utterances_dataframe()`, and `get_speakers_dataframe()`.
- `Conversation` objects have a `get_chronological_speaker_list()` method for getting a chronological list of conversation participants
- `Conversation`'s `print_conversation_structure()` method has a new argument `limit` for limiting the number of utterances displayed to the number specified in `limit`.

Transformers

- New `invalid_val` argument for `HyperConvo` that automatically replaces NaN values with the default value specified in `invalid_val`.
- `FightingWords.summarize()` now provides labelled plots

Bug fixes

- Fixed minor bug in `download()` when downloading Reddit corpora.
- Fixed bugs in `HyperConvo` that were causing NaN warnings and incorrect calculation. Fixed minor bug that was causing HyperConvo annotations to not be JSON-serializable.
- Fixed bug in `Classifier` and `BoWClassifier` that was causing inconsistent behaviour for compressed vs. uncompressed vector metadata


Other changes

- Warnings in ConvoKit for deprecation have been made more consistent.
- We now have continuous integration for pushes and pull requests! Thanks to mwilbz for helping set this up.

2.3

Functionality

Transformers new summarize() functionality

Some Transformers now have a summarize() function that summarizes the annotated corpus (i.e. annotated by a transform() call) in a way that gives the user a high-level view / interpretation of the annotated metadata.

New Transformers

We introduce several new Transformers: Classifier, Bag-of-Words Classifier, Ranker, Pairer, Paired Prediction, Paired Bag-of-Words Prediction, Fighting Words, and (Conversational) Forecaster (with variants: Bag-of-Words and CRAFT).

New TextProcessor

We introduce TextCleaner, which does text cleaning for online text data. This cleaner depends on the *clean-text* package.

Enhanced Conversation functionality
- Conversation.check_integrity() can be used to check if a conversation has a valid and intact reply-to chain (i.e. only one root utterance, every utterance specified by reply-to exists, etc)
- Conversation.print_conversation_structure() is a way of pretty-printing a Conversation's thread structure (whether displaying just its utterances' ids, texts, or other details is customizable)
- Conversation.get_chronological_utterance_list() provides a list of the Conversation's utterances sorted from earliest to latest timestamp

**Tree operations**
- Conversation.traverse() allows for Conversations to be traversed as a tree structure, e.g. breadth-first, depth-first, pre-order, post-order. Specifically, traverse() returns an iterator of Utterances or UtteranceNodes (a wrapper class for working with Utterances in a conversational tree setting)
- Conversation allows for subtree extraction using any arbitrary utterance in the Conversation as the new root
- Conversation.get_root_to_leaf_paths() returns all the root to leaf paths in the conversation tree

Other changes

Public-facing interface changes
- All Corpus objects now support a full set of all possible object iterators (e.g. User.iter_utterances() or Corpus.iter_users()) with selector functions (i.e. filters that select for the corpus object to be generated)
- Corpus has new methods for checking for the presence of corpus objects, e.g. corpus.has_utterance(), corpus.has_conversation(), corpus.has_user()
- A random User / Utterance / Conversation can be obtained from a Corpus with corpus.random_user() / corpus.random_utterance() / corpus.random_conversation()
- User objects now have ids, not names. Corpus.get_usernames() and User.name are deprecated (in favor of Corpus.get_user_ids() and User.id respectively) and print a warning when used.
- Corpora can be mutated to only include specific Conversations by using Corpus.filter_conversations_by()
- Corpus filtering by utterance is no longer supported to avoid encouraging Corpus mutations that break Conversation reply-to chains. Corpus.filter_utterances_by() is now deprecated and no longer usable.
- Corpus object (i.e. User, Utterance, Conversation) ids and metadata keys must now be strings or None. It used to be that any Hashable object could be used, but this posed problems for corpus dumping to and loading from jsons.
- Deletion of a metadata key for one object results in deletion of that metadata key for all objects of that object type
- Corpus.dump() automatically increments the version number of the Corpus by 1.
- Corpus.download() now has a *use_local* boolean parameter that allows offline users to skip the online check for a new dataset version and uses the local version by default.
- Fixed a bug where specified conversation and user metadata were not getting excluded correctly during Corpus initialisation step
- \_\_str\_\_ is now implemented to provide a concise human-readable string display of the Corpus object (that hides private variables)
- Fixed some bugs with Hypergraph motif counting

Internal changes
- Corpus initialisation and dumping have been heavily refactored to improve future maintainability.
- There is a new CorpusObject parent class that User, Utterance, and Conversation inherit from. This parent class implements some shared functionality for all Corpus objects.
- Corpus now uses a ConvokitIndex object to correctly track the metadata state of itself and its Corpus objects. Previously, this index was computed on the spot when Corpus.dump() was called, and referred to when loading a Corpus. However, any changes to a loaded Corpus object would not update the internal index of the Corpus, meaning the index could be inconsistent with the Corpus state.
- Corpus objects (Corpus, User, Utterance, Conversation) all use a ConvokitMeta object instead of a simple dict() for their metadata. This change is necessary to ensure that updates to the metadata (key additions / deletions) are reflected in ConvokitIndex. However, because ConvokitMeta inherits from the dict class, there is no change to how users should work with the .meta attribute.
- Users and Utterances now have 'owner' attributes to indicate the Corpus they belong to. This change is necessary for the maintaining of a consistent index. (Conversations have always had this attribute.)
- Introduces optional dependencies on the *clean-text* and *torch* packages for sanitizing text under the FightingWords Transformer and running a neural network as part of the Forecaster-CRAFT Transformer respectively.
- A single script for running all existing test suites has been created to speed up testing before deployment: *tests/run_all_tests.py*

2.2

Updates to various parts of ConvoKit:

Text processing

Added support for creating Transformers that compute utterance attributes. Also updated support for dependency-parsing text. An example of how this new functionality can be used is found [here](https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/blob/master/examples/text-processing/text_preprocessing_demo.ipynb).

Corpus

Added some functionality to

* support loading and storage of auxiliary data
* handling of vector representations
* organizing users' activities within conversations
* build dataframes containing attributes of various objects

Prompt types

Updated the code used to compute prompt types and phrasing motifs, deprecating the old QuestionTypology module. An example of how the updated code is used can be found [here](https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/blob/master/examples/prompt-types/prompt-type-demo.ipynb) and [here](https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/blob/master/examples/prompt-types/prompt-type-wrapper-demo.ipynb).

User Conversation Diversity

Updated code used to compute linguistic divergence.

Other

Added support for pipelining, and some limited support for computing per-utterance attributes.

2.0

This is the public release of the brand-new, overhauled ConvoKit API, marking a major version number bump to 2.0.

Compared to previous releases, the newly refactored API has been heavily streamlined to unite all conversational analysis modules under a single consistent interface, which should hopefully decrease the learning curve for the toolkit. The new API is inspired by scikit-learn and should be familiar to those who have prior experience with that package. A high-level explanation of the API and object model can be found [here](https://zissou.infosci.cornell.edu/socialkit/documentation/architecture.html) along with a [step-by-step tutorial for getting started programming with ConvoKit](https://zissou.infosci.cornell.edu/socialkit/documentation/tutorial.html).

Page 2 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.