Convokit

Latest version: v3.0.1

Safety actively analyzes 681881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

3.0.1

We are excited to announce the release of ConvoKit 3.0.1, which focuses on bug fixes, adding new datasets, and dependency upgrades. Key updates include:

- Fixed issue with ConvoKit's download method that prevented datasets from being downloaded to the configured directory.
- Fixed the support for downloading non-corpus objects
- Updated the conversational forecasting transformer to make it more flexible
- Added five new datasets, with documentation available on our website and documentation site.
- Addressed compatibility issues related to Numpy by building against Numpy 2.0+ and upgrading dependency packages accordingly.

We address some potential issues on our [Troubleshooting page](https://convokit.cornell.edu/documentation/troubleshooting.html), especially with Numpy. If you encounter any issues, feel free to join our [Discord community](https://discord.gg/WMFqMWgz6P) for more support, or submit an issue on GitHub. Thank you!

Notice that we no longer support Python 3.8 (EOL) and 3.9 (not supported by Numpy 2.0.0+).

You can refer to the following pull requests for more details:
- Fixing bugs:
- [1] Fixing ConvoKit download method 225 217
- [2] New Forecaster Framework 217

- New datasets:
- [1] CANDOR corpus 201
- [2] DeliData corpus 238
- [3] FORA corpus 238
- [4] NPR-2P corpus 238
- [5] FOMC corpus 238

- Dependency packages:
- [1] Building ConvoKit to work with Numpy 2.0.0+ 229 251 247

Contributors:
- Kaixiang Zhang (Sean)
- Ethan Xia
- Yash Chatha
- Laerdon Yah-Sung Kim
- Jonathan P. Chang

3.0.0

We're excited to announce the public release of **Convokit 3.0**!

The new version of ConvoKit now supports MongoDB as a backend choice for working with corpus data. This update provides several benefits, such as taking advantage of MongoDB's lazy loading to handle extremely large corpora, and ensuring resilience to unexpected crashes by continuously writing all changes to the database.

To learn more about using MongoDB as a backend choice, refer to our documentation at https://convokit.cornell.edu/documentation/storage_options.html.

Database Backend
Historically, ConvoKit allows you to work with conversational data directly in program memory through the Corpus class. Moreover, long term storage is provided by dumping the contents of a Corpus onto disk using the JSON format. This paradigm works well for distributing and storing static datasets, and for doing computations on conversational data that follow the pattern of doing computations on some or all of the data over a short time period and optionally storing these results on disk. For example, ConvoKit distributes datasets included with the library in JSON format, which you can load into program memory to explore and compute with.

In ConvoKit version 3.0.0, we introduce a new option for working with conversational data: the MongoDB backend. Consider a use case where you want to collect conversational data over a long time period and ensure you maintain a persistent representation of the dataset if your data collection program unexpectedly crashes. In the memory backend paradigm, this would require regularly dumping your corpus to JSON files, requiring repeated expensive write operations. On the other hand, with the new database backend, all your data is automatically saved for long term storage in the database as it is added to the corpus.

Documentation
Please refer to [this database setup document](https://convokit.cornell.edu/documentation/db_setup.html) to setup a mongoDB database and [this storage document](https://convokit.cornell.edu/documentation/storage_options.html) for a further explanation of how the database backend option works.

Tests
Updated tests to include db_mode testing.

Examples
Updated examples to include demonstration of db_mode usage.

Bug Fixes
- Fixed issue where `corpus.utterances` throws an error in `politenessAPI` as it should call `corpus.iter_utterances()` instead. Corpus items should not access their private variables and should use the public "getters" for access.
- Fixed bug in `coordination.py` for the usage of metadata mutability.
- Fixed issue in Pairer with `pair_mode` set to `maximize` causing the pairing function to return an integer, which causes an error in pairing objects.

Breaking Changes
Modified `ConvoKit.Metadata` to disallow any mutability to metadata fields. Implemented by returning deepcopy of metadata field storage every time the field is accessed. It is intended to align the behaviors between memory and DB modes. 197

Change Log
**Added:**
- Added DB backend mode to allow working with corpora using database as a supporting backend. 175 184
- Extended `__init__` in `model/corpus.py` with parameters for DB functionality. 175
- Updated `model/backendMapper` to separate memory and DB transactions. 175
- Introduces a new layer of abstraction between Corpus components (Utterance, Speaker, Conversation, ConvoKitMeta) and concrete data mapping. Data mapping is now handled by a BackendMapper instance variable in the Corpus. 169

**Changed:**
- Modified files in the ConvoKit model to support both memory mode and DB mode backends. 175
- Removed deprecated arguments and functions from ConvoKit model. 176
- Updated demo examples with older version of ConvoKit object references. 192

**Fixed:**
- Fixed usage of the mutability of metadata within `coordination.py`. 197
- Fixed issue in the Pairer module when `pair_mode` was set to `maximize`, causing the pairing function to return an integer and subsequently leading to an error. 197
- Fixed issue that caused `corpus.utterances` to throw an error within `politenessAPI`. 170
- Fixed FightingWords to allow overlapping classes. 189

**Python Version Requirement Update:**
- With Python 3.7 reached EOL (end of life) on June 27, 2023, ConvoKit now requires Python 3.8 or above.

2.5.3

2.5.2

This release adds support for Chinese politeness strategy extraction. Currently, ConvoKit's [politenessStrategies](https://convokit.cornell.edu/documentation/politenessStrategies.html) supports three politeness strategy collections covering two languages.

2.5.1

This release includes a new method `from_pandas` in the Corpus class that should simplify the Corpus creation process.

It generates a ConvoKit corpus from pandas dataframes of speakers, utterances, and conversations.

A notebook demonstrating the use of this method can be found [here](https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/blob/master/examples/corpus_from_pandas.ipynb).

2.5

This release contains an implementation of the [Expected Conversational Context Framework](https://convokit.cornell.edu/documentation/expected_context_model.html), and [associated demos](https://github.com/CornellNLP/Cornell-Conversational-Analysis-Toolkit/tree/master/convokit/expected_context_framework/demos).

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.