<!-- Release notes generated using configuration in .github/release.yml at v0.3.0 -->
This release extends our exporting capabilities and adds support for loading custom embeddings.
Because the shape of exported data has changed, this is a breaking change so we released 0.3.0.
Loading custom embeddings
Loading pre-computed embeddings from an external source is now possible. See our [Custom embeddings](https://docs.lilacml.com/datasets/dataset_embeddings.html) guide for more details.
py
Load the embeddings into Lilac.
def _load_embedding(item):
return vector_store[item['id']]
Load the embeddings into Lilac.
ds.load_embedding(
load_fn=_load_embedding, index_path='text', embedding='my_embedding', overwrite=True
)
Export to HuggingFace
You can now export to a HuggingFace dataset.
py
Export a Lilac dataset to a huggingface dataset.
hf_ds = ds.to_huggingface()
Optionally: use the HuggingFace API to push the dataset to the hub.
hf_ds.push_to_hub('lilacai/glaive-function-calling-v2-sharegpt')
Exporting no longer flattens data
Before this release, exporting would flatten source data. For instance, data that looks like:
{
'conversations': [{
'from': 'user',
'value': 'Hello there'
}]
Would get exported incorrectly as:
{'conversations.*.from': ['user'], 'conversations.*.value': ['Hello there']}
Now it is exported exactly the way it was shaped when importing.
What's Changed
Features
* Add support for loading custom embeddings. by nsthorat in https://github.com/lilacai/lilac/pull/1090
* Fix dataset export to avoid flattening the user data by dsmilkov in https://github.com/lilacai/lilac/pull/1091
* Export to HuggingFace. Support glaive-function-calling-v2 in the demo, clusters, and via sharegpt. by nsthorat in https://github.com/lilacai/lilac/pull/1113
Performance
* Speed up PII and lang detection by making them multiprocess by dsmilkov in https://github.com/lilacai/lilac/pull/1097
Bug fixes
* Bug fixes: overwrite, task errors, embedding keys. by nsthorat in https://github.com/lilacai/lilac/pull/1098
* Fixed cache busting behavior by brilee in https://github.com/lilacai/lilac/pull/1099
* Small fixes for the demo. by nsthorat in https://github.com/lilacai/lilac/pull/1106
* Fix a bug where we drop source fields that have embeddings computed on them by nsthorat in https://gith* Couple of small bug fixes by dsmilkov in https://github.com/lilacai/lilac/pull/1109
ub.com/lilacai/lilac/pull/1093
* Fix edge case where table doesn't exist and doesn't get created by brilee in https://github.com/lilacai/lilac/pull/1110
* Fix the cluster sort by membership score bug by dsmilkov in https://github.com/lilacai/lilac/pull/1112
Lilac Garden
* Rename remote => use_garden. by nsthorat in https://github.com/lilacai/lilac/pull/1092
* Fix chunking bug for remote embedding computation by dsmilkov in https://github.com/lilacai/lilac/pull/1096
* Add accelerated PII execution on Lilac Garden by dsmilkov in https://github.com/lilacai/lilac/pull/1103
* Move use_garden outside of a Signal. by nsthorat in https://github.com/lilacai/lilac/pull/1102
UI
* Refactor buttons so we have a single cluster button. by nsthorat in https://github.com/lilacai/lilac/pull/1111
**Full Changelog**: https://github.com/lilacai/lilac/compare/v0.2.5...v0.3.0