What's Changed
* Taggers for URL filtering by soldni in https://github.com/allenai/dolma/pull/112
* Updated CFF and Bibtex by soldni in https://github.com/allenai/dolma/pull/118
* Add preliminary Dolma v1.7 configurations, fix corner case in tokens. by soldni in https://github.com/allenai/dolma/pull/120
* Update CITATION.cff by soldni in https://github.com/allenai/dolma/pull/126
* Option to use ngram overlap to dedupe paragraphs by rodneykinney in https://github.com/allenai/dolma/pull/122
* Tagger modules import (fix for 128) by soldni in https://github.com/allenai/dolma/pull/129
* Added Support for JQ syntax in include/exclude mixer config by soldni in https://github.com/allenai/dolma/pull/131
* Added JQ syntax for replacements + added minimum score. by soldni in https://github.com/allenai/dolma/pull/133
* Bump the cargo group group with 1 update by dependabot in https://github.com/allenai/dolma/pull/132
* Improves tool to compute statistics; adds deduplication options. by soldni in https://github.com/allenai/dolma/pull/135
* use precompiled regex when loading url blocklists by peterbjorgensen in https://github.com/allenai/dolma/pull/137
**Full Changelog**: https://github.com/allenai/dolma/compare/v1.0.1...v1.0.2