Dolma

Latest version: v1.0.14.post1

Safety actively analyzes 682387 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 5

1.0.9

What's Changed
* Fix Tests to pass with new mixer behavior by soldni in https://github.com/allenai/dolma/pull/184


**Full Changelog**: https://github.com/allenai/dolma/compare/v1.0.8...v1.0.9

1.0.8

What's Changed
* Always use inferred extension by undfined in https://github.com/allenai/dolma/pull/183


**Full Changelog**: https://github.com/allenai/dolma/compare/v1.0.7...v1.0.8

1.0.7

What's Changed
* Better Filters Error Handling by soldni in https://github.com/allenai/dolma/pull/171
* Bump openssl from 0.10.64 to 0.10.66 in the cargo group by dependabot in https://github.com/allenai/dolma/pull/178
* Bump to 1.0.7 by undfined in https://github.com/allenai/dolma/pull/182


**Full Changelog**: https://github.com/allenai/dolma/compare/v1.0.6...v1.0.7

1.0.6

What's Changed
* V2 of Gopher tagger by soldni in https://github.com/allenai/dolma/pull/181


**Full Changelog**: https://github.com/allenai/dolma/compare/v1.0.5...v1.0.6

1.0.5

What's Changed
* Cherry pick zstd compressor by undfined in https://github.com/allenai/dolma/pull/180


**Full Changelog**: https://github.com/allenai/dolma/compare/v1.0.4...v1.0.5

1.0.4

What's Changed
* Bump rustls from 0.21.10 to 0.21.11 in the cargo group across 1 directory by dependabot in https://github.com/allenai/dolma/pull/149
* fix divide by 0 in gopher tagger by peterbjorgensen in https://github.com/allenai/dolma/pull/148
* Fixing dtype option not being correctly propagated by soldni in https://github.com/allenai/dolma/pull/154
* Add support for parsing WARC by soldni in https://github.com/allenai/dolma/pull/153
* Reducing hash calls by Whattabatt in https://github.com/allenai/dolma/pull/156
* Bump rustls from 0.21.11 to 0.21.12 in the cargo group across 1 directory by dependabot in https://github.com/allenai/dolma/pull/155
* Adding Quality Classifier from Dolma 1.7 by soldni in https://github.com/allenai/dolma/pull/163
* Adds ZST support in Deduper and Mixer by soldni in https://github.com/allenai/dolma/pull/170
* Workaround to fix memory leak in HuggingFace tokenizer by soldni in https://github.com/allenai/dolma/pull/169
* Adding partition logic by Whattabatt in https://github.com/allenai/dolma/pull/161
* added option for tokenizer to split on special tokens by soldni in https://github.com/allenai/dolma/pull/176
* Version bump for new release (1.0.4) by soldni in https://github.com/allenai/dolma/pull/179

New Contributors
* Whattabatt made their first contribution in https://github.com/allenai/dolma/pull/156

**Full Changelog**: https://github.com/allenai/dolma/compare/v1.0.3...v1.0.4

Page 2 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.