Cmoncrawl

Latest version: v1.1.8

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 3

1.1.2

What's Changed
* Ruff Linting and easier development cycle with Makefile by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/92
* Docs update by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/93
* Update README.md by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/94
* 🔥 Removal of cc indexes arg by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/87


**Full Changelog**: https://github.com/hynky1999/CmonCrawl/compare/1.1.0...1.1.2

1.1.0

Code
- Default throttling for downloaders set to max 300 requests per second.
- `Downloader` now takes a client for downloading, currently there exists two clients:
* s3 -> Directly queries the common crawl buckets
* api -> Quries CommonCrawl API Gateway

- Retry system has been updated to leverage tenacity, additionaly we now use random exponential random backoff instead of linear random backoff

CLI
- New global parameter `--aws_profile` for setting an aws_profile to use
- New parameter `--download_method` which can be set for
* `extract...records --download_method`
* `download...html --download_method`

In both cases the argument can be set to either s3 or api, which definies how the commoncrawl will be accessed when downloading warc files.

1.0.5

What's Changed
* Athena by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/86
* remove tests fix downloader by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/88
* Remove release tests by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/89


**Full Changelog**: https://github.com/hynky1999/CmonCrawl/compare/1.0.4...1.0.5

1.0.4

What's Changed
* Structure by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/81


**Full Changelog**: https://github.com/hynky1999/CmonCrawl/compare/1.0.3...1.0.4

1.0.3

- New downloader for offline iteration over warc files
- Verbosity in CLI has now 3 levels (0,1,2 -> higher more verbose)

1.0.2

What's Changed
* Update README.md by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/69
* encoding + typing + better index retry by hynky1999 in https://github.com/hynky1999/CmonCrawl/pull/68


**Full Changelog**: https://github.com/hynky1999/CmonCrawl/compare/1.0.1...1.0.2

Page 2 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.