Breaking Changes
* SemanticChunker no longer accepts SentenceTransformer models directly; instead, this release uses the `SentenceTransformerEmbeddings` class, which can take in a model directly. Future releases will add the functionality to auto-detect and create embeddings inside the `AutoEmbeddings` class.
* By default, `semantic` optional installation now depends on `Model2VecEmbeddings` and hence `model2vec` python package from this release onwards, due to size and speed benefits. `Model2Vec` uses static embeddings which are good enough for the task of chunking while being 10x faster than standard Sentence Transformers and being a 10x lighter dependency.
* `SemanticChunker` and `SDPMChunker` now use the argument `chunk_size` instead of `max_chunk_size` for uniformity across the chunkers, but the internal representation remains the same.
What's Changed
* [BUG] Fix the start_index and end_index to point to character indices, not token indices by mrmps in https://github.com/bhavnicksm/chonkie/pull/29
* [DOCS] Fix typo for import tokenizer in quick start example by jasonacox in https://github.com/bhavnicksm/chonkie/pull/30
* Major Update: Fix bugs + Update docs + Add slots to dataclasses + update word & sentence splitting logic + minor changes by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/32
* Use `__slots__` instead of `slots=True` for python3.9 support by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/34
* Bump version to 0.2.0.post1 in pyproject.toml and __init__.py by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/35
* [FEAT] Add SentenceTransformerEmbeddings, EmbeddingsRegistry and AutoEmbeddings provider support by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/44
* Refactor BaseChunker, SemanticChunker and SDPMChunker to support BaseEmbeddings by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/45
* Add initial OpenAIEmbeddings support to Chonkie ✨ by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/46
* [DOCS] Add info about initial embeddings support and how to add custom embeddings by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/47
* [FEAT] - Add model2vec embedding models by sky-2002 in https://github.com/bhavnicksm/chonkie/pull/41
* [FEAT] Add support for Model2VecEmbeddings + Switch default embeddings to Model2VecEmbeddings by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/49
* [fix] Reorganize optional dependencies in pyproject.toml: rename 'sem… by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/51
* [Fix] Token counts from Tokenizers and Transformers adding special tokens by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/52
* [Fix] Refactor WordChunker, SentenceChunker pre-chunk splitting for reconstruction tests + minor changes by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/53
* [Refactor] Optimize similarity calculation by using np.divide for imp… by bhavnicksm in https://github.com/bhavnicksm/chonkie/pull/54
New Contributors
* mrmps made their first contribution in https://github.com/bhavnicksm/chonkie/pull/29
* jasonacox made their first contribution in https://github.com/bhavnicksm/chonkie/pull/30
* sky-2002 made their first contribution in https://github.com/bhavnicksm/chonkie/pull/41
**Full Changelog**: https://github.com/bhavnicksm/chonkie/compare/v0.2.0...v0.2.1