Highlights
- **Vertex AI is now supported as a backend for pipeline execution.**
Simply run `fondant run vertex <pipeline.py>` to submit your pipeline.
Run `fondant run vertex --help` to see the possible configuration options.
- **The reusable components are now available on DockerHub under the `fndnt` organization.**
DockerHub is supported more broadly than Github container registry which we were using before.
- **Previously executed components are now cached when re-executed with the same arguments.**
- This makes it easier to iterate on development of down-stream components
- This allows you to resume failed pipelines from their failed step
- **Added `fondant build` command which let's you build fondant components easily**
Run `fondant build <component_dir>`. Check `fondant build -h` for options.
The command will also update the image reference in the `fondant_component.yaml` to the newly built one.
- **We migrated from KfP v1 to KfP v2**. This means:
- We now benefit from the latest KfP developments
- We compile fondant pipelines to the IR YAML format, which is supported by other execution engines such as Vertex
- You need a KfP v2 cluster to run fondant pipelines
Fixes
- Fix data explorer for usage on Windows
- Fix propagation of `client_kwargs` argument to configure Dask Client
Components
- Every reusable component now has a clear README describing its usage
- Add `load_from_parquet` component to load parquet files as input data
- Add `embed_text` component to embed documents and other text
- Add `chunk_text` component to chunk documents into passages
- Add `index_weaviate` component to index data in a weaviate vector store
- Fix issue with mixed type ids in LAION retrieval components
- Improve success rate of `download_images` component
- Fix OOM issues for inference components using GPU
- Limit data read by `load_from_hub` component to used columns
Detailed changes
* Add contribution segment by GeorgesLorre in https://github.com/ml6team/fondant/pull/463
* Update sample pipeline by mrchtr in https://github.com/ml6team/fondant/pull/464
* Update project description by RobbeSneyders in https://github.com/ml6team/fondant/pull/465
* Disable caching in the image retrieval sample pipeline by mrchtr in https://github.com/ml6team/fondant/pull/467
* Improve download images logs by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/466
* Add CC-25M announcement to docs by RobbeSneyders in https://github.com/ml6team/fondant/pull/468
* Update release announcements by mrchtr in https://github.com/ml6team/fondant/pull/471
* Add dataset link to press release by mrchtr in https://github.com/ml6team/fondant/pull/472
* Create load from parquet by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/474
* Fix caching writes by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/469
* Add caching dependency by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/479
* Add memory request and limit to components by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/482
* Improve hit rate of download images component by RobbeSneyders in https://github.com/ml6team/fondant/pull/470
* Cast id to string laion by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/485
* Bugfix partitioning by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/478
* Generate READMEs for all components using a script by RobbeSneyders in https://github.com/ml6team/fondant/pull/484
* Add component hub doc page by RobbeSneyders in https://github.com/ml6team/fondant/pull/487
* explorer small fix by Hakimovich99 in https://github.com/ml6team/fondant/pull/481
* Optimize GPU components by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/489
* Update Pillow to 10.0.1 to fix security issues by RobbeSneyders in https://github.com/ml6team/fondant/pull/493
* Update documentation regarding feedback by mrchtr in https://github.com/ml6team/fondant/pull/473
* Restructure-cli by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/488
* Add empty requirements.txt to load_from_parquet component by RobbeSneyders in https://github.com/ml6team/fondant/pull/504
* Use s3 client instead of http to access common crawl by mrchtr in https://github.com/ml6team/fondant/pull/501
* Fix run CLI by RobbeSneyders in https://github.com/ml6team/fondant/pull/507
* Migrate to KfpV2 by GeorgesLorre in https://github.com/ml6team/fondant/pull/477
* Remove abstract component test by mrchtr in https://github.com/ml6team/fondant/pull/510
* Only keep columns in produces by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/490
* Run black on components in pre-commit by RobbeSneyders in https://github.com/ml6team/fondant/pull/511
* Run bandit on components by RobbeSneyders in https://github.com/ml6team/fondant/pull/513
* Move container registry to DockerHub by RobbeSneyders in https://github.com/ml6team/fondant/pull/514
* Update component docs by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/516
* Vertex cli by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/519
* Refactor compile method for kfp and vertex by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/522
* Modify arg default by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/524
* Propagate `client_kwargs` argument and lower extract_images python version by RobbeSneyders in https://github.com/ml6team/fondant/pull/525
* Revert fsspec changes by mrchtr in https://github.com/ml6team/fondant/pull/523
* Add resource limits for Vertex by RobbeSneyders in https://github.com/ml6team/fondant/pull/529
* Update vertex and general docs by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/526
* Component/generate embeddings by tillwenke in https://github.com/ml6team/fondant/pull/520
* Add fondant build command by RobbeSneyders in https://github.com/ml6team/fondant/pull/527
* Fix explorer build script for DockerHub by RobbeSneyders in https://github.com/ml6team/fondant/pull/531
* Chunker component by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/528
* Update text embedding component by PhilippeMoussalli in https://github.com/ml6team/fondant/pull/532
* Add IndexWeaviate component by tillwenke in https://github.com/ml6team/fondant/pull/521
* Build command: raise errors when pushing and make tag optional by RobbeSneyders in https://github.com/ml6team/fondant/pull/533
* Update component readmes by RobbeSneyders in https://github.com/ml6team/fondant/pull/538
* Add network argument to vertex runner by RobbeSneyders in https://github.com/ml6team/fondant/pull/537
New Contributors
* Hakimovich99 made their first contribution in https://github.com/ml6team/fondant/pull/481
**Full Changelog**: https://github.com/ml6team/fondant/compare/0.5.0...0.6.0