Cc2dataset

Latest version: v1.5.0

Safety actively analyzes 623909 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

1.5.0

* improve document extension list
* add a few more video extension
* Implement relative links (thanks Sebastian Nagel)
* add filename and url metadata (thanks marianna13)
* add filename and url metadata

1.4.0

* Add text and video document types

1.3.1

* Rename to cc2dataset

1.3.0

* Support audio document type
* Restart spark session for each part.
* Improve error handling and logging.
* Implement resume + speed up by reading file from s3 all at once.

1.2.0

* Add try catch on archive for broken wat.
* Implement multipart.
* Shuffle + use date as output path + write wat index files + shuffle input wat

1.1.0

* deduplication

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.